martinhillebrandtd commited on
Commit
e80f7c7
·
1 Parent(s): d3a9109
README.md CHANGED
@@ -1,3 +1,2656 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - jinaai/negation-dataset
4
+ inference: false
5
+ language: en
6
+ license: apache-2.0
7
+ model-index:
8
+ - name: jina-embedding-s-en-v2
9
+ results:
10
+ - dataset:
11
+ config: en
12
+ name: MTEB AmazonCounterfactualClassification (en)
13
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
14
+ split: test
15
+ type: mteb/amazon_counterfactual
16
+ metrics:
17
+ - type: accuracy
18
+ value: 71.35820895522387
19
+ - type: ap
20
+ value: 33.99931933598115
21
+ - type: f1
22
+ value: 65.3853685535555
23
+ task:
24
+ type: Classification
25
+ - dataset:
26
+ config: default
27
+ name: MTEB AmazonPolarityClassification
28
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
29
+ split: test
30
+ type: mteb/amazon_polarity
31
+ metrics:
32
+ - type: accuracy
33
+ value: 82.90140000000001
34
+ - type: ap
35
+ value: 78.01434597815617
36
+ - type: f1
37
+ value: 82.83357802722676
38
+ task:
39
+ type: Classification
40
+ - dataset:
41
+ config: en
42
+ name: MTEB AmazonReviewsClassification (en)
43
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
44
+ split: test
45
+ type: mteb/amazon_reviews_multi
46
+ metrics:
47
+ - type: accuracy
48
+ value: 40.88999999999999
49
+ - type: f1
50
+ value: 39.209432767163456
51
+ task:
52
+ type: Classification
53
+ - dataset:
54
+ config: default
55
+ name: MTEB ArguAna
56
+ revision: None
57
+ split: test
58
+ type: arguana
59
+ metrics:
60
+ - type: map_at_1
61
+ value: 23.257
62
+ - type: map_at_10
63
+ value: 37.946000000000005
64
+ - type: map_at_100
65
+ value: 39.17
66
+ - type: map_at_1000
67
+ value: 39.181
68
+ - type: map_at_3
69
+ value: 32.99
70
+ - type: map_at_5
71
+ value: 35.467999999999996
72
+ - type: mrr_at_1
73
+ value: 23.541999999999998
74
+ - type: mrr_at_10
75
+ value: 38.057
76
+ - type: mrr_at_100
77
+ value: 39.289
78
+ - type: mrr_at_1000
79
+ value: 39.299
80
+ - type: mrr_at_3
81
+ value: 33.096
82
+ - type: mrr_at_5
83
+ value: 35.628
84
+ - type: ndcg_at_1
85
+ value: 23.257
86
+ - type: ndcg_at_10
87
+ value: 46.729
88
+ - type: ndcg_at_100
89
+ value: 51.900999999999996
90
+ - type: ndcg_at_1000
91
+ value: 52.16
92
+ - type: ndcg_at_3
93
+ value: 36.323
94
+ - type: ndcg_at_5
95
+ value: 40.766999999999996
96
+ - type: precision_at_1
97
+ value: 23.257
98
+ - type: precision_at_10
99
+ value: 7.510999999999999
100
+ - type: precision_at_100
101
+ value: 0.976
102
+ - type: precision_at_1000
103
+ value: 0.1
104
+ - type: precision_at_3
105
+ value: 15.339
106
+ - type: precision_at_5
107
+ value: 11.350999999999999
108
+ - type: recall_at_1
109
+ value: 23.257
110
+ - type: recall_at_10
111
+ value: 75.107
112
+ - type: recall_at_100
113
+ value: 97.58200000000001
114
+ - type: recall_at_1000
115
+ value: 99.57300000000001
116
+ - type: recall_at_3
117
+ value: 46.017
118
+ - type: recall_at_5
119
+ value: 56.757000000000005
120
+ task:
121
+ type: Retrieval
122
+ - dataset:
123
+ config: default
124
+ name: MTEB ArxivClusteringP2P
125
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
126
+ split: test
127
+ type: mteb/arxiv-clustering-p2p
128
+ metrics:
129
+ - type: v_measure
130
+ value: 44.02420878391967
131
+ task:
132
+ type: Clustering
133
+ - dataset:
134
+ config: default
135
+ name: MTEB ArxivClusteringS2S
136
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
137
+ split: test
138
+ type: mteb/arxiv-clustering-s2s
139
+ metrics:
140
+ - type: v_measure
141
+ value: 35.16136856000258
142
+ task:
143
+ type: Clustering
144
+ - dataset:
145
+ config: default
146
+ name: MTEB AskUbuntuDupQuestions
147
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
148
+ split: test
149
+ type: mteb/askubuntudupquestions-reranking
150
+ metrics:
151
+ - type: map
152
+ value: 59.61809790513646
153
+ - type: mrr
154
+ value: 73.07215406938397
155
+ task:
156
+ type: Reranking
157
+ - dataset:
158
+ config: default
159
+ name: MTEB BIOSSES
160
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
161
+ split: test
162
+ type: mteb/biosses-sts
163
+ metrics:
164
+ - type: cos_sim_pearson
165
+ value: 82.0167350090749
166
+ - type: cos_sim_spearman
167
+ value: 80.51569002630401
168
+ - type: euclidean_pearson
169
+ value: 81.46820525099726
170
+ - type: euclidean_spearman
171
+ value: 80.51569002630401
172
+ - type: manhattan_pearson
173
+ value: 81.35596555056757
174
+ - type: manhattan_spearman
175
+ value: 80.12592210903303
176
+ task:
177
+ type: STS
178
+ - dataset:
179
+ config: default
180
+ name: MTEB Banking77Classification
181
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
182
+ split: test
183
+ type: mteb/banking77
184
+ metrics:
185
+ - type: accuracy
186
+ value: 78.25
187
+ - type: f1
188
+ value: 77.34950913540605
189
+ task:
190
+ type: Classification
191
+ - dataset:
192
+ config: default
193
+ name: MTEB BiorxivClusteringP2P
194
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
195
+ split: test
196
+ type: mteb/biorxiv-clustering-p2p
197
+ metrics:
198
+ - type: v_measure
199
+ value: 35.57238596005698
200
+ task:
201
+ type: Clustering
202
+ - dataset:
203
+ config: default
204
+ name: MTEB BiorxivClusteringS2S
205
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
206
+ split: test
207
+ type: mteb/biorxiv-clustering-s2s
208
+ metrics:
209
+ - type: v_measure
210
+ value: 29.066444306196683
211
+ task:
212
+ type: Clustering
213
+ - dataset:
214
+ config: default
215
+ name: MTEB CQADupstackAndroidRetrieval
216
+ revision: None
217
+ split: test
218
+ type: BeIR/cqadupstack
219
+ metrics:
220
+ - type: map_at_1
221
+ value: 31.891000000000002
222
+ - type: map_at_10
223
+ value: 42.772
224
+ - type: map_at_100
225
+ value: 44.108999999999995
226
+ - type: map_at_1000
227
+ value: 44.236
228
+ - type: map_at_3
229
+ value: 39.289
230
+ - type: map_at_5
231
+ value: 41.113
232
+ - type: mrr_at_1
233
+ value: 39.342
234
+ - type: mrr_at_10
235
+ value: 48.852000000000004
236
+ - type: mrr_at_100
237
+ value: 49.534
238
+ - type: mrr_at_1000
239
+ value: 49.582
240
+ - type: mrr_at_3
241
+ value: 46.089999999999996
242
+ - type: mrr_at_5
243
+ value: 47.685
244
+ - type: ndcg_at_1
245
+ value: 39.342
246
+ - type: ndcg_at_10
247
+ value: 48.988
248
+ - type: ndcg_at_100
249
+ value: 53.854
250
+ - type: ndcg_at_1000
251
+ value: 55.955
252
+ - type: ndcg_at_3
253
+ value: 43.877
254
+ - type: ndcg_at_5
255
+ value: 46.027
256
+ - type: precision_at_1
257
+ value: 39.342
258
+ - type: precision_at_10
259
+ value: 9.285
260
+ - type: precision_at_100
261
+ value: 1.488
262
+ - type: precision_at_1000
263
+ value: 0.194
264
+ - type: precision_at_3
265
+ value: 20.696
266
+ - type: precision_at_5
267
+ value: 14.878
268
+ - type: recall_at_1
269
+ value: 31.891000000000002
270
+ - type: recall_at_10
271
+ value: 60.608
272
+ - type: recall_at_100
273
+ value: 81.025
274
+ - type: recall_at_1000
275
+ value: 94.883
276
+ - type: recall_at_3
277
+ value: 45.694
278
+ - type: recall_at_5
279
+ value: 51.684
280
+ - type: map_at_1
281
+ value: 28.778
282
+ - type: map_at_10
283
+ value: 37.632
284
+ - type: map_at_100
285
+ value: 38.800000000000004
286
+ - type: map_at_1000
287
+ value: 38.934999999999995
288
+ - type: map_at_3
289
+ value: 35.293
290
+ - type: map_at_5
291
+ value: 36.547000000000004
292
+ - type: mrr_at_1
293
+ value: 35.35
294
+ - type: mrr_at_10
295
+ value: 42.936
296
+ - type: mrr_at_100
297
+ value: 43.69
298
+ - type: mrr_at_1000
299
+ value: 43.739
300
+ - type: mrr_at_3
301
+ value: 41.062
302
+ - type: mrr_at_5
303
+ value: 42.097
304
+ - type: ndcg_at_1
305
+ value: 35.35
306
+ - type: ndcg_at_10
307
+ value: 42.528
308
+ - type: ndcg_at_100
309
+ value: 46.983000000000004
310
+ - type: ndcg_at_1000
311
+ value: 49.187999999999995
312
+ - type: ndcg_at_3
313
+ value: 39.271
314
+ - type: ndcg_at_5
315
+ value: 40.654
316
+ - type: precision_at_1
317
+ value: 35.35
318
+ - type: precision_at_10
319
+ value: 7.828
320
+ - type: precision_at_100
321
+ value: 1.3010000000000002
322
+ - type: precision_at_1000
323
+ value: 0.17700000000000002
324
+ - type: precision_at_3
325
+ value: 18.96
326
+ - type: precision_at_5
327
+ value: 13.120999999999999
328
+ - type: recall_at_1
329
+ value: 28.778
330
+ - type: recall_at_10
331
+ value: 50.775000000000006
332
+ - type: recall_at_100
333
+ value: 69.66799999999999
334
+ - type: recall_at_1000
335
+ value: 83.638
336
+ - type: recall_at_3
337
+ value: 40.757
338
+ - type: recall_at_5
339
+ value: 44.86
340
+ - type: map_at_1
341
+ value: 37.584
342
+ - type: map_at_10
343
+ value: 49.69
344
+ - type: map_at_100
345
+ value: 50.639
346
+ - type: map_at_1000
347
+ value: 50.702999999999996
348
+ - type: map_at_3
349
+ value: 46.61
350
+ - type: map_at_5
351
+ value: 48.486000000000004
352
+ - type: mrr_at_1
353
+ value: 43.009
354
+ - type: mrr_at_10
355
+ value: 52.949999999999996
356
+ - type: mrr_at_100
357
+ value: 53.618
358
+ - type: mrr_at_1000
359
+ value: 53.65299999999999
360
+ - type: mrr_at_3
361
+ value: 50.605999999999995
362
+ - type: mrr_at_5
363
+ value: 52.095
364
+ - type: ndcg_at_1
365
+ value: 43.009
366
+ - type: ndcg_at_10
367
+ value: 55.278000000000006
368
+ - type: ndcg_at_100
369
+ value: 59.134
370
+ - type: ndcg_at_1000
371
+ value: 60.528999999999996
372
+ - type: ndcg_at_3
373
+ value: 50.184
374
+ - type: ndcg_at_5
375
+ value: 52.919000000000004
376
+ - type: precision_at_1
377
+ value: 43.009
378
+ - type: precision_at_10
379
+ value: 8.821
380
+ - type: precision_at_100
381
+ value: 1.161
382
+ - type: precision_at_1000
383
+ value: 0.133
384
+ - type: precision_at_3
385
+ value: 22.424
386
+ - type: precision_at_5
387
+ value: 15.436
388
+ - type: recall_at_1
389
+ value: 37.584
390
+ - type: recall_at_10
391
+ value: 68.514
392
+ - type: recall_at_100
393
+ value: 85.099
394
+ - type: recall_at_1000
395
+ value: 95.123
396
+ - type: recall_at_3
397
+ value: 55.007
398
+ - type: recall_at_5
399
+ value: 61.714999999999996
400
+ - type: map_at_1
401
+ value: 24.7
402
+ - type: map_at_10
403
+ value: 32.804
404
+ - type: map_at_100
405
+ value: 33.738
406
+ - type: map_at_1000
407
+ value: 33.825
408
+ - type: map_at_3
409
+ value: 30.639
410
+ - type: map_at_5
411
+ value: 31.781
412
+ - type: mrr_at_1
413
+ value: 26.328000000000003
414
+ - type: mrr_at_10
415
+ value: 34.679
416
+ - type: mrr_at_100
417
+ value: 35.510000000000005
418
+ - type: mrr_at_1000
419
+ value: 35.577999999999996
420
+ - type: mrr_at_3
421
+ value: 32.58
422
+ - type: mrr_at_5
423
+ value: 33.687
424
+ - type: ndcg_at_1
425
+ value: 26.328000000000003
426
+ - type: ndcg_at_10
427
+ value: 37.313
428
+ - type: ndcg_at_100
429
+ value: 42.004000000000005
430
+ - type: ndcg_at_1000
431
+ value: 44.232
432
+ - type: ndcg_at_3
433
+ value: 33.076
434
+ - type: ndcg_at_5
435
+ value: 34.966
436
+ - type: precision_at_1
437
+ value: 26.328000000000003
438
+ - type: precision_at_10
439
+ value: 5.627
440
+ - type: precision_at_100
441
+ value: 0.8410000000000001
442
+ - type: precision_at_1000
443
+ value: 0.106
444
+ - type: precision_at_3
445
+ value: 14.011000000000001
446
+ - type: precision_at_5
447
+ value: 9.582
448
+ - type: recall_at_1
449
+ value: 24.7
450
+ - type: recall_at_10
451
+ value: 49.324
452
+ - type: recall_at_100
453
+ value: 71.018
454
+ - type: recall_at_1000
455
+ value: 87.905
456
+ - type: recall_at_3
457
+ value: 37.7
458
+ - type: recall_at_5
459
+ value: 42.281
460
+ - type: map_at_1
461
+ value: 14.350999999999999
462
+ - type: map_at_10
463
+ value: 21.745
464
+ - type: map_at_100
465
+ value: 22.731
466
+ - type: map_at_1000
467
+ value: 22.852
468
+ - type: map_at_3
469
+ value: 19.245
470
+ - type: map_at_5
471
+ value: 20.788
472
+ - type: mrr_at_1
473
+ value: 18.159
474
+ - type: mrr_at_10
475
+ value: 25.833000000000002
476
+ - type: mrr_at_100
477
+ value: 26.728
478
+ - type: mrr_at_1000
479
+ value: 26.802
480
+ - type: mrr_at_3
481
+ value: 23.383000000000003
482
+ - type: mrr_at_5
483
+ value: 24.887999999999998
484
+ - type: ndcg_at_1
485
+ value: 18.159
486
+ - type: ndcg_at_10
487
+ value: 26.518000000000004
488
+ - type: ndcg_at_100
489
+ value: 31.473000000000003
490
+ - type: ndcg_at_1000
491
+ value: 34.576
492
+ - type: ndcg_at_3
493
+ value: 21.907
494
+ - type: ndcg_at_5
495
+ value: 24.39
496
+ - type: precision_at_1
497
+ value: 18.159
498
+ - type: precision_at_10
499
+ value: 4.938
500
+ - type: precision_at_100
501
+ value: 0.853
502
+ - type: precision_at_1000
503
+ value: 0.125
504
+ - type: precision_at_3
505
+ value: 10.655000000000001
506
+ - type: precision_at_5
507
+ value: 7.985
508
+ - type: recall_at_1
509
+ value: 14.350999999999999
510
+ - type: recall_at_10
511
+ value: 37.284
512
+ - type: recall_at_100
513
+ value: 59.11300000000001
514
+ - type: recall_at_1000
515
+ value: 81.634
516
+ - type: recall_at_3
517
+ value: 24.753
518
+ - type: recall_at_5
519
+ value: 30.979
520
+ - type: map_at_1
521
+ value: 26.978
522
+ - type: map_at_10
523
+ value: 36.276
524
+ - type: map_at_100
525
+ value: 37.547000000000004
526
+ - type: map_at_1000
527
+ value: 37.678
528
+ - type: map_at_3
529
+ value: 33.674
530
+ - type: map_at_5
531
+ value: 35.119
532
+ - type: mrr_at_1
533
+ value: 32.916000000000004
534
+ - type: mrr_at_10
535
+ value: 41.798
536
+ - type: mrr_at_100
537
+ value: 42.72
538
+ - type: mrr_at_1000
539
+ value: 42.778
540
+ - type: mrr_at_3
541
+ value: 39.493
542
+ - type: mrr_at_5
543
+ value: 40.927
544
+ - type: ndcg_at_1
545
+ value: 32.916000000000004
546
+ - type: ndcg_at_10
547
+ value: 41.81
548
+ - type: ndcg_at_100
549
+ value: 47.284
550
+ - type: ndcg_at_1000
551
+ value: 49.702
552
+ - type: ndcg_at_3
553
+ value: 37.486999999999995
554
+ - type: ndcg_at_5
555
+ value: 39.597
556
+ - type: precision_at_1
557
+ value: 32.916000000000004
558
+ - type: precision_at_10
559
+ value: 7.411
560
+ - type: precision_at_100
561
+ value: 1.189
562
+ - type: precision_at_1000
563
+ value: 0.158
564
+ - type: precision_at_3
565
+ value: 17.581
566
+ - type: precision_at_5
567
+ value: 12.397
568
+ - type: recall_at_1
569
+ value: 26.978
570
+ - type: recall_at_10
571
+ value: 52.869
572
+ - type: recall_at_100
573
+ value: 75.78399999999999
574
+ - type: recall_at_1000
575
+ value: 91.545
576
+ - type: recall_at_3
577
+ value: 40.717
578
+ - type: recall_at_5
579
+ value: 46.168
580
+ - type: map_at_1
581
+ value: 24.641
582
+ - type: map_at_10
583
+ value: 32.916000000000004
584
+ - type: map_at_100
585
+ value: 34.165
586
+ - type: map_at_1000
587
+ value: 34.286
588
+ - type: map_at_3
589
+ value: 30.335
590
+ - type: map_at_5
591
+ value: 31.569000000000003
592
+ - type: mrr_at_1
593
+ value: 30.593999999999998
594
+ - type: mrr_at_10
595
+ value: 38.448
596
+ - type: mrr_at_100
597
+ value: 39.299
598
+ - type: mrr_at_1000
599
+ value: 39.362
600
+ - type: mrr_at_3
601
+ value: 36.244
602
+ - type: mrr_at_5
603
+ value: 37.232
604
+ - type: ndcg_at_1
605
+ value: 30.593999999999998
606
+ - type: ndcg_at_10
607
+ value: 38.2
608
+ - type: ndcg_at_100
609
+ value: 43.742
610
+ - type: ndcg_at_1000
611
+ value: 46.217000000000006
612
+ - type: ndcg_at_3
613
+ value: 33.925
614
+ - type: ndcg_at_5
615
+ value: 35.394
616
+ - type: precision_at_1
617
+ value: 30.593999999999998
618
+ - type: precision_at_10
619
+ value: 6.895
620
+ - type: precision_at_100
621
+ value: 1.1320000000000001
622
+ - type: precision_at_1000
623
+ value: 0.153
624
+ - type: precision_at_3
625
+ value: 16.096
626
+ - type: precision_at_5
627
+ value: 11.05
628
+ - type: recall_at_1
629
+ value: 24.641
630
+ - type: recall_at_10
631
+ value: 48.588
632
+ - type: recall_at_100
633
+ value: 72.841
634
+ - type: recall_at_1000
635
+ value: 89.535
636
+ - type: recall_at_3
637
+ value: 36.087
638
+ - type: recall_at_5
639
+ value: 40.346
640
+ - type: map_at_1
641
+ value: 24.79425
642
+ - type: map_at_10
643
+ value: 33.12033333333333
644
+ - type: map_at_100
645
+ value: 34.221333333333334
646
+ - type: map_at_1000
647
+ value: 34.3435
648
+ - type: map_at_3
649
+ value: 30.636583333333338
650
+ - type: map_at_5
651
+ value: 31.974083333333326
652
+ - type: mrr_at_1
653
+ value: 29.242416666666664
654
+ - type: mrr_at_10
655
+ value: 37.11675
656
+ - type: mrr_at_100
657
+ value: 37.93783333333334
658
+ - type: mrr_at_1000
659
+ value: 38.003083333333336
660
+ - type: mrr_at_3
661
+ value: 34.904666666666664
662
+ - type: mrr_at_5
663
+ value: 36.12916666666667
664
+ - type: ndcg_at_1
665
+ value: 29.242416666666664
666
+ - type: ndcg_at_10
667
+ value: 38.03416666666667
668
+ - type: ndcg_at_100
669
+ value: 42.86674999999999
670
+ - type: ndcg_at_1000
671
+ value: 45.34550000000001
672
+ - type: ndcg_at_3
673
+ value: 33.76466666666666
674
+ - type: ndcg_at_5
675
+ value: 35.668666666666674
676
+ - type: precision_at_1
677
+ value: 29.242416666666664
678
+ - type: precision_at_10
679
+ value: 6.589833333333334
680
+ - type: precision_at_100
681
+ value: 1.0693333333333332
682
+ - type: precision_at_1000
683
+ value: 0.14641666666666667
684
+ - type: precision_at_3
685
+ value: 15.430749999999998
686
+ - type: precision_at_5
687
+ value: 10.833833333333333
688
+ - type: recall_at_1
689
+ value: 24.79425
690
+ - type: recall_at_10
691
+ value: 48.582916666666655
692
+ - type: recall_at_100
693
+ value: 69.88499999999999
694
+ - type: recall_at_1000
695
+ value: 87.211
696
+ - type: recall_at_3
697
+ value: 36.625499999999995
698
+ - type: recall_at_5
699
+ value: 41.553999999999995
700
+ - type: map_at_1
701
+ value: 22.767
702
+ - type: map_at_10
703
+ value: 28.450999999999997
704
+ - type: map_at_100
705
+ value: 29.332
706
+ - type: map_at_1000
707
+ value: 29.426000000000002
708
+ - type: map_at_3
709
+ value: 26.379
710
+ - type: map_at_5
711
+ value: 27.584999999999997
712
+ - type: mrr_at_1
713
+ value: 25.46
714
+ - type: mrr_at_10
715
+ value: 30.974
716
+ - type: mrr_at_100
717
+ value: 31.784000000000002
718
+ - type: mrr_at_1000
719
+ value: 31.857999999999997
720
+ - type: mrr_at_3
721
+ value: 28.962
722
+ - type: mrr_at_5
723
+ value: 30.066
724
+ - type: ndcg_at_1
725
+ value: 25.46
726
+ - type: ndcg_at_10
727
+ value: 32.041
728
+ - type: ndcg_at_100
729
+ value: 36.522
730
+ - type: ndcg_at_1000
731
+ value: 39.101
732
+ - type: ndcg_at_3
733
+ value: 28.152
734
+ - type: ndcg_at_5
735
+ value: 30.03
736
+ - type: precision_at_1
737
+ value: 25.46
738
+ - type: precision_at_10
739
+ value: 4.893
740
+ - type: precision_at_100
741
+ value: 0.77
742
+ - type: precision_at_1000
743
+ value: 0.107
744
+ - type: precision_at_3
745
+ value: 11.605
746
+ - type: precision_at_5
747
+ value: 8.19
748
+ - type: recall_at_1
749
+ value: 22.767
750
+ - type: recall_at_10
751
+ value: 40.71
752
+ - type: recall_at_100
753
+ value: 61.334999999999994
754
+ - type: recall_at_1000
755
+ value: 80.567
756
+ - type: recall_at_3
757
+ value: 30.198000000000004
758
+ - type: recall_at_5
759
+ value: 34.803
760
+ - type: map_at_1
761
+ value: 16.722
762
+ - type: map_at_10
763
+ value: 22.794
764
+ - type: map_at_100
765
+ value: 23.7
766
+ - type: map_at_1000
767
+ value: 23.822
768
+ - type: map_at_3
769
+ value: 20.781
770
+ - type: map_at_5
771
+ value: 22.024
772
+ - type: mrr_at_1
773
+ value: 20.061999999999998
774
+ - type: mrr_at_10
775
+ value: 26.346999999999998
776
+ - type: mrr_at_100
777
+ value: 27.153
778
+ - type: mrr_at_1000
779
+ value: 27.233
780
+ - type: mrr_at_3
781
+ value: 24.375
782
+ - type: mrr_at_5
783
+ value: 25.593
784
+ - type: ndcg_at_1
785
+ value: 20.061999999999998
786
+ - type: ndcg_at_10
787
+ value: 26.785999999999998
788
+ - type: ndcg_at_100
789
+ value: 31.319999999999997
790
+ - type: ndcg_at_1000
791
+ value: 34.346
792
+ - type: ndcg_at_3
793
+ value: 23.219
794
+ - type: ndcg_at_5
795
+ value: 25.107000000000003
796
+ - type: precision_at_1
797
+ value: 20.061999999999998
798
+ - type: precision_at_10
799
+ value: 4.78
800
+ - type: precision_at_100
801
+ value: 0.83
802
+ - type: precision_at_1000
803
+ value: 0.125
804
+ - type: precision_at_3
805
+ value: 10.874
806
+ - type: precision_at_5
807
+ value: 7.956
808
+ - type: recall_at_1
809
+ value: 16.722
810
+ - type: recall_at_10
811
+ value: 35.204
812
+ - type: recall_at_100
813
+ value: 55.797
814
+ - type: recall_at_1000
815
+ value: 77.689
816
+ - type: recall_at_3
817
+ value: 25.245
818
+ - type: recall_at_5
819
+ value: 30.115
820
+ - type: map_at_1
821
+ value: 24.842
822
+ - type: map_at_10
823
+ value: 32.917
824
+ - type: map_at_100
825
+ value: 33.961000000000006
826
+ - type: map_at_1000
827
+ value: 34.069
828
+ - type: map_at_3
829
+ value: 30.595
830
+ - type: map_at_5
831
+ value: 31.837
832
+ - type: mrr_at_1
833
+ value: 29.011
834
+ - type: mrr_at_10
835
+ value: 36.977
836
+ - type: mrr_at_100
837
+ value: 37.814
838
+ - type: mrr_at_1000
839
+ value: 37.885999999999996
840
+ - type: mrr_at_3
841
+ value: 34.966
842
+ - type: mrr_at_5
843
+ value: 36.043
844
+ - type: ndcg_at_1
845
+ value: 29.011
846
+ - type: ndcg_at_10
847
+ value: 37.735
848
+ - type: ndcg_at_100
849
+ value: 42.683
850
+ - type: ndcg_at_1000
851
+ value: 45.198
852
+ - type: ndcg_at_3
853
+ value: 33.650000000000006
854
+ - type: ndcg_at_5
855
+ value: 35.386
856
+ - type: precision_at_1
857
+ value: 29.011
858
+ - type: precision_at_10
859
+ value: 6.259
860
+ - type: precision_at_100
861
+ value: 0.984
862
+ - type: precision_at_1000
863
+ value: 0.13
864
+ - type: precision_at_3
865
+ value: 15.329999999999998
866
+ - type: precision_at_5
867
+ value: 10.541
868
+ - type: recall_at_1
869
+ value: 24.842
870
+ - type: recall_at_10
871
+ value: 48.304
872
+ - type: recall_at_100
873
+ value: 70.04899999999999
874
+ - type: recall_at_1000
875
+ value: 87.82600000000001
876
+ - type: recall_at_3
877
+ value: 36.922
878
+ - type: recall_at_5
879
+ value: 41.449999999999996
880
+ - type: map_at_1
881
+ value: 24.252000000000002
882
+ - type: map_at_10
883
+ value: 32.293
884
+ - type: map_at_100
885
+ value: 33.816
886
+ - type: map_at_1000
887
+ value: 34.053
888
+ - type: map_at_3
889
+ value: 29.781999999999996
890
+ - type: map_at_5
891
+ value: 31.008000000000003
892
+ - type: mrr_at_1
893
+ value: 29.051
894
+ - type: mrr_at_10
895
+ value: 36.722
896
+ - type: mrr_at_100
897
+ value: 37.663000000000004
898
+ - type: mrr_at_1000
899
+ value: 37.734
900
+ - type: mrr_at_3
901
+ value: 34.354
902
+ - type: mrr_at_5
903
+ value: 35.609
904
+ - type: ndcg_at_1
905
+ value: 29.051
906
+ - type: ndcg_at_10
907
+ value: 37.775999999999996
908
+ - type: ndcg_at_100
909
+ value: 43.221
910
+ - type: ndcg_at_1000
911
+ value: 46.116
912
+ - type: ndcg_at_3
913
+ value: 33.403
914
+ - type: ndcg_at_5
915
+ value: 35.118
916
+ - type: precision_at_1
917
+ value: 29.051
918
+ - type: precision_at_10
919
+ value: 7.332
920
+ - type: precision_at_100
921
+ value: 1.49
922
+ - type: precision_at_1000
923
+ value: 0.23600000000000002
924
+ - type: precision_at_3
925
+ value: 15.415000000000001
926
+ - type: precision_at_5
927
+ value: 11.107
928
+ - type: recall_at_1
929
+ value: 24.252000000000002
930
+ - type: recall_at_10
931
+ value: 47.861
932
+ - type: recall_at_100
933
+ value: 72.21600000000001
934
+ - type: recall_at_1000
935
+ value: 90.886
936
+ - type: recall_at_3
937
+ value: 35.533
938
+ - type: recall_at_5
939
+ value: 39.959
940
+ - type: map_at_1
941
+ value: 20.025000000000002
942
+ - type: map_at_10
943
+ value: 27.154
944
+ - type: map_at_100
945
+ value: 28.118
946
+ - type: map_at_1000
947
+ value: 28.237000000000002
948
+ - type: map_at_3
949
+ value: 25.017
950
+ - type: map_at_5
951
+ value: 25.832
952
+ - type: mrr_at_1
953
+ value: 21.627
954
+ - type: mrr_at_10
955
+ value: 28.884999999999998
956
+ - type: mrr_at_100
957
+ value: 29.741
958
+ - type: mrr_at_1000
959
+ value: 29.831999999999997
960
+ - type: mrr_at_3
961
+ value: 26.741
962
+ - type: mrr_at_5
963
+ value: 27.628000000000004
964
+ - type: ndcg_at_1
965
+ value: 21.627
966
+ - type: ndcg_at_10
967
+ value: 31.436999999999998
968
+ - type: ndcg_at_100
969
+ value: 36.181000000000004
970
+ - type: ndcg_at_1000
971
+ value: 38.986
972
+ - type: ndcg_at_3
973
+ value: 27.025
974
+ - type: ndcg_at_5
975
+ value: 28.436
976
+ - type: precision_at_1
977
+ value: 21.627
978
+ - type: precision_at_10
979
+ value: 5.009
980
+ - type: precision_at_100
981
+ value: 0.7929999999999999
982
+ - type: precision_at_1000
983
+ value: 0.11299999999999999
984
+ - type: precision_at_3
985
+ value: 11.522
986
+ - type: precision_at_5
987
+ value: 7.763000000000001
988
+ - type: recall_at_1
989
+ value: 20.025000000000002
990
+ - type: recall_at_10
991
+ value: 42.954
992
+ - type: recall_at_100
993
+ value: 64.67500000000001
994
+ - type: recall_at_1000
995
+ value: 85.301
996
+ - type: recall_at_3
997
+ value: 30.892999999999997
998
+ - type: recall_at_5
999
+ value: 34.288000000000004
1000
+ task:
1001
+ type: Retrieval
1002
+ - dataset:
1003
+ config: default
1004
+ name: MTEB ClimateFEVER
1005
+ revision: None
1006
+ split: test
1007
+ type: climate-fever
1008
+ metrics:
1009
+ - type: map_at_1
1010
+ value: 10.079
1011
+ - type: map_at_10
1012
+ value: 16.930999999999997
1013
+ - type: map_at_100
1014
+ value: 18.398999999999997
1015
+ - type: map_at_1000
1016
+ value: 18.561
1017
+ - type: map_at_3
1018
+ value: 14.294
1019
+ - type: map_at_5
1020
+ value: 15.579
1021
+ - type: mrr_at_1
1022
+ value: 22.606
1023
+ - type: mrr_at_10
1024
+ value: 32.513
1025
+ - type: mrr_at_100
1026
+ value: 33.463
1027
+ - type: mrr_at_1000
1028
+ value: 33.513999999999996
1029
+ - type: mrr_at_3
1030
+ value: 29.479
1031
+ - type: mrr_at_5
1032
+ value: 31.3
1033
+ - type: ndcg_at_1
1034
+ value: 22.606
1035
+ - type: ndcg_at_10
1036
+ value: 24.053
1037
+ - type: ndcg_at_100
1038
+ value: 30.258000000000003
1039
+ - type: ndcg_at_1000
1040
+ value: 33.516
1041
+ - type: ndcg_at_3
1042
+ value: 19.721
1043
+ - type: ndcg_at_5
1044
+ value: 21.144
1045
+ - type: precision_at_1
1046
+ value: 22.606
1047
+ - type: precision_at_10
1048
+ value: 7.55
1049
+ - type: precision_at_100
1050
+ value: 1.399
1051
+ - type: precision_at_1000
1052
+ value: 0.2
1053
+ - type: precision_at_3
1054
+ value: 14.701
1055
+ - type: precision_at_5
1056
+ value: 11.192
1057
+ - type: recall_at_1
1058
+ value: 10.079
1059
+ - type: recall_at_10
1060
+ value: 28.970000000000002
1061
+ - type: recall_at_100
1062
+ value: 50.805
1063
+ - type: recall_at_1000
1064
+ value: 69.378
1065
+ - type: recall_at_3
1066
+ value: 18.199
1067
+ - type: recall_at_5
1068
+ value: 22.442
1069
+ task:
1070
+ type: Retrieval
1071
+ - dataset:
1072
+ config: default
1073
+ name: MTEB DBPedia
1074
+ revision: None
1075
+ split: test
1076
+ type: dbpedia-entity
1077
+ metrics:
1078
+ - type: map_at_1
1079
+ value: 7.794
1080
+ - type: map_at_10
1081
+ value: 15.165999999999999
1082
+ - type: map_at_100
1083
+ value: 20.508000000000003
1084
+ - type: map_at_1000
1085
+ value: 21.809
1086
+ - type: map_at_3
1087
+ value: 11.568000000000001
1088
+ - type: map_at_5
1089
+ value: 13.059000000000001
1090
+ - type: mrr_at_1
1091
+ value: 56.49999999999999
1092
+ - type: mrr_at_10
1093
+ value: 65.90899999999999
1094
+ - type: mrr_at_100
1095
+ value: 66.352
1096
+ - type: mrr_at_1000
1097
+ value: 66.369
1098
+ - type: mrr_at_3
1099
+ value: 64.0
1100
+ - type: mrr_at_5
1101
+ value: 65.10000000000001
1102
+ - type: ndcg_at_1
1103
+ value: 44.25
1104
+ - type: ndcg_at_10
1105
+ value: 32.649
1106
+ - type: ndcg_at_100
1107
+ value: 36.668
1108
+ - type: ndcg_at_1000
1109
+ value: 43.918
1110
+ - type: ndcg_at_3
1111
+ value: 37.096000000000004
1112
+ - type: ndcg_at_5
1113
+ value: 34.048
1114
+ - type: precision_at_1
1115
+ value: 56.49999999999999
1116
+ - type: precision_at_10
1117
+ value: 25.45
1118
+ - type: precision_at_100
1119
+ value: 8.055
1120
+ - type: precision_at_1000
1121
+ value: 1.7489999999999999
1122
+ - type: precision_at_3
1123
+ value: 41.0
1124
+ - type: precision_at_5
1125
+ value: 32.85
1126
+ - type: recall_at_1
1127
+ value: 7.794
1128
+ - type: recall_at_10
1129
+ value: 20.101
1130
+ - type: recall_at_100
1131
+ value: 42.448
1132
+ - type: recall_at_1000
1133
+ value: 65.88000000000001
1134
+ - type: recall_at_3
1135
+ value: 12.753
1136
+ - type: recall_at_5
1137
+ value: 15.307
1138
+ task:
1139
+ type: Retrieval
1140
+ - dataset:
1141
+ config: default
1142
+ name: MTEB EmotionClassification
1143
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1144
+ split: test
1145
+ type: mteb/emotion
1146
+ metrics:
1147
+ - type: accuracy
1148
+ value: 44.01
1149
+ - type: f1
1150
+ value: 38.659680951114964
1151
+ task:
1152
+ type: Classification
1153
+ - dataset:
1154
+ config: default
1155
+ name: MTEB FEVER
1156
+ revision: None
1157
+ split: test
1158
+ type: fever
1159
+ metrics:
1160
+ - type: map_at_1
1161
+ value: 49.713
1162
+ - type: map_at_10
1163
+ value: 61.79
1164
+ - type: map_at_100
1165
+ value: 62.28
1166
+ - type: map_at_1000
1167
+ value: 62.297000000000004
1168
+ - type: map_at_3
1169
+ value: 59.361
1170
+ - type: map_at_5
1171
+ value: 60.92100000000001
1172
+ - type: mrr_at_1
1173
+ value: 53.405
1174
+ - type: mrr_at_10
1175
+ value: 65.79899999999999
1176
+ - type: mrr_at_100
1177
+ value: 66.219
1178
+ - type: mrr_at_1000
1179
+ value: 66.227
1180
+ - type: mrr_at_3
1181
+ value: 63.431000000000004
1182
+ - type: mrr_at_5
1183
+ value: 64.98
1184
+ - type: ndcg_at_1
1185
+ value: 53.405
1186
+ - type: ndcg_at_10
1187
+ value: 68.01899999999999
1188
+ - type: ndcg_at_100
1189
+ value: 70.197
1190
+ - type: ndcg_at_1000
1191
+ value: 70.571
1192
+ - type: ndcg_at_3
1193
+ value: 63.352
1194
+ - type: ndcg_at_5
1195
+ value: 66.018
1196
+ - type: precision_at_1
1197
+ value: 53.405
1198
+ - type: precision_at_10
1199
+ value: 9.119
1200
+ - type: precision_at_100
1201
+ value: 1.03
1202
+ - type: precision_at_1000
1203
+ value: 0.107
1204
+ - type: precision_at_3
1205
+ value: 25.602999999999998
1206
+ - type: precision_at_5
1207
+ value: 16.835
1208
+ - type: recall_at_1
1209
+ value: 49.713
1210
+ - type: recall_at_10
1211
+ value: 83.306
1212
+ - type: recall_at_100
1213
+ value: 92.92
1214
+ - type: recall_at_1000
1215
+ value: 95.577
1216
+ - type: recall_at_3
1217
+ value: 70.798
1218
+ - type: recall_at_5
1219
+ value: 77.254
1220
+ task:
1221
+ type: Retrieval
1222
+ - dataset:
1223
+ config: default
1224
+ name: MTEB FiQA2018
1225
+ revision: None
1226
+ split: test
1227
+ type: fiqa
1228
+ metrics:
1229
+ - type: map_at_1
1230
+ value: 15.310000000000002
1231
+ - type: map_at_10
1232
+ value: 26.204
1233
+ - type: map_at_100
1234
+ value: 27.932000000000002
1235
+ - type: map_at_1000
1236
+ value: 28.121000000000002
1237
+ - type: map_at_3
1238
+ value: 22.481
1239
+ - type: map_at_5
1240
+ value: 24.678
1241
+ - type: mrr_at_1
1242
+ value: 29.784
1243
+ - type: mrr_at_10
1244
+ value: 39.582
1245
+ - type: mrr_at_100
1246
+ value: 40.52
1247
+ - type: mrr_at_1000
1248
+ value: 40.568
1249
+ - type: mrr_at_3
1250
+ value: 37.114000000000004
1251
+ - type: mrr_at_5
1252
+ value: 38.596000000000004
1253
+ - type: ndcg_at_1
1254
+ value: 29.784
1255
+ - type: ndcg_at_10
1256
+ value: 33.432
1257
+ - type: ndcg_at_100
1258
+ value: 40.281
1259
+ - type: ndcg_at_1000
1260
+ value: 43.653999999999996
1261
+ - type: ndcg_at_3
1262
+ value: 29.612
1263
+ - type: ndcg_at_5
1264
+ value: 31.223
1265
+ - type: precision_at_1
1266
+ value: 29.784
1267
+ - type: precision_at_10
1268
+ value: 9.645
1269
+ - type: precision_at_100
1270
+ value: 1.645
1271
+ - type: precision_at_1000
1272
+ value: 0.22499999999999998
1273
+ - type: precision_at_3
1274
+ value: 20.165
1275
+ - type: precision_at_5
1276
+ value: 15.401000000000002
1277
+ - type: recall_at_1
1278
+ value: 15.310000000000002
1279
+ - type: recall_at_10
1280
+ value: 40.499
1281
+ - type: recall_at_100
1282
+ value: 66.643
1283
+ - type: recall_at_1000
1284
+ value: 87.059
1285
+ - type: recall_at_3
1286
+ value: 27.492
1287
+ - type: recall_at_5
1288
+ value: 33.748
1289
+ task:
1290
+ type: Retrieval
1291
+ - dataset:
1292
+ config: default
1293
+ name: MTEB HotpotQA
1294
+ revision: None
1295
+ split: test
1296
+ type: hotpotqa
1297
+ metrics:
1298
+ - type: map_at_1
1299
+ value: 33.599000000000004
1300
+ - type: map_at_10
1301
+ value: 47.347
1302
+ - type: map_at_100
1303
+ value: 48.191
1304
+ - type: map_at_1000
1305
+ value: 48.263
1306
+ - type: map_at_3
1307
+ value: 44.698
1308
+ - type: map_at_5
1309
+ value: 46.278999999999996
1310
+ - type: mrr_at_1
1311
+ value: 67.19800000000001
1312
+ - type: mrr_at_10
1313
+ value: 74.054
1314
+ - type: mrr_at_100
1315
+ value: 74.376
1316
+ - type: mrr_at_1000
1317
+ value: 74.392
1318
+ - type: mrr_at_3
1319
+ value: 72.849
1320
+ - type: mrr_at_5
1321
+ value: 73.643
1322
+ - type: ndcg_at_1
1323
+ value: 67.19800000000001
1324
+ - type: ndcg_at_10
1325
+ value: 56.482
1326
+ - type: ndcg_at_100
1327
+ value: 59.694
1328
+ - type: ndcg_at_1000
1329
+ value: 61.204
1330
+ - type: ndcg_at_3
1331
+ value: 52.43299999999999
1332
+ - type: ndcg_at_5
1333
+ value: 54.608000000000004
1334
+ - type: precision_at_1
1335
+ value: 67.19800000000001
1336
+ - type: precision_at_10
1337
+ value: 11.613999999999999
1338
+ - type: precision_at_100
1339
+ value: 1.415
1340
+ - type: precision_at_1000
1341
+ value: 0.16199999999999998
1342
+ - type: precision_at_3
1343
+ value: 32.726
1344
+ - type: precision_at_5
1345
+ value: 21.349999999999998
1346
+ - type: recall_at_1
1347
+ value: 33.599000000000004
1348
+ - type: recall_at_10
1349
+ value: 58.069
1350
+ - type: recall_at_100
1351
+ value: 70.736
1352
+ - type: recall_at_1000
1353
+ value: 80.804
1354
+ - type: recall_at_3
1355
+ value: 49.088
1356
+ - type: recall_at_5
1357
+ value: 53.376000000000005
1358
+ task:
1359
+ type: Retrieval
1360
+ - dataset:
1361
+ config: default
1362
+ name: MTEB ImdbClassification
1363
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1364
+ split: test
1365
+ type: mteb/imdb
1366
+ metrics:
1367
+ - type: accuracy
1368
+ value: 73.64359999999999
1369
+ - type: ap
1370
+ value: 67.54685976014599
1371
+ - type: f1
1372
+ value: 73.55148707559482
1373
+ task:
1374
+ type: Classification
1375
+ - dataset:
1376
+ config: default
1377
+ name: MTEB MSMARCO
1378
+ revision: None
1379
+ split: dev
1380
+ type: msmarco
1381
+ metrics:
1382
+ - type: map_at_1
1383
+ value: 19.502
1384
+ - type: map_at_10
1385
+ value: 30.816
1386
+ - type: map_at_100
1387
+ value: 32.007999999999996
1388
+ - type: map_at_1000
1389
+ value: 32.067
1390
+ - type: map_at_3
1391
+ value: 27.215
1392
+ - type: map_at_5
1393
+ value: 29.304000000000002
1394
+ - type: mrr_at_1
1395
+ value: 20.072000000000003
1396
+ - type: mrr_at_10
1397
+ value: 31.406
1398
+ - type: mrr_at_100
1399
+ value: 32.549
1400
+ - type: mrr_at_1000
1401
+ value: 32.602
1402
+ - type: mrr_at_3
1403
+ value: 27.839000000000002
1404
+ - type: mrr_at_5
1405
+ value: 29.926000000000002
1406
+ - type: ndcg_at_1
1407
+ value: 20.086000000000002
1408
+ - type: ndcg_at_10
1409
+ value: 37.282
1410
+ - type: ndcg_at_100
1411
+ value: 43.206
1412
+ - type: ndcg_at_1000
1413
+ value: 44.690000000000005
1414
+ - type: ndcg_at_3
1415
+ value: 29.932
1416
+ - type: ndcg_at_5
1417
+ value: 33.668
1418
+ - type: precision_at_1
1419
+ value: 20.086000000000002
1420
+ - type: precision_at_10
1421
+ value: 5.961
1422
+ - type: precision_at_100
1423
+ value: 0.898
1424
+ - type: precision_at_1000
1425
+ value: 0.10200000000000001
1426
+ - type: precision_at_3
1427
+ value: 12.856000000000002
1428
+ - type: precision_at_5
1429
+ value: 9.596
1430
+ - type: recall_at_1
1431
+ value: 19.502
1432
+ - type: recall_at_10
1433
+ value: 57.182
1434
+ - type: recall_at_100
1435
+ value: 84.952
1436
+ - type: recall_at_1000
1437
+ value: 96.34700000000001
1438
+ - type: recall_at_3
1439
+ value: 37.193
1440
+ - type: recall_at_5
1441
+ value: 46.157
1442
+ task:
1443
+ type: Retrieval
1444
+ - dataset:
1445
+ config: en
1446
+ name: MTEB MTOPDomainClassification (en)
1447
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1448
+ split: test
1449
+ type: mteb/mtop_domain
1450
+ metrics:
1451
+ - type: accuracy
1452
+ value: 93.96488828089375
1453
+ - type: f1
1454
+ value: 93.32119260543482
1455
+ task:
1456
+ type: Classification
1457
+ - dataset:
1458
+ config: en
1459
+ name: MTEB MTOPIntentClassification (en)
1460
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1461
+ split: test
1462
+ type: mteb/mtop_intent
1463
+ metrics:
1464
+ - type: accuracy
1465
+ value: 72.4965800273598
1466
+ - type: f1
1467
+ value: 49.34896217536082
1468
+ task:
1469
+ type: Classification
1470
+ - dataset:
1471
+ config: en
1472
+ name: MTEB MassiveIntentClassification (en)
1473
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1474
+ split: test
1475
+ type: mteb/amazon_massive_intent
1476
+ metrics:
1477
+ - type: accuracy
1478
+ value: 67.60928043039678
1479
+ - type: f1
1480
+ value: 64.34244712074538
1481
+ task:
1482
+ type: Classification
1483
+ - dataset:
1484
+ config: en
1485
+ name: MTEB MassiveScenarioClassification (en)
1486
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1487
+ split: test
1488
+ type: mteb/amazon_massive_scenario
1489
+ metrics:
1490
+ - type: accuracy
1491
+ value: 69.75453934095493
1492
+ - type: f1
1493
+ value: 68.39224867489249
1494
+ task:
1495
+ type: Classification
1496
+ - dataset:
1497
+ config: default
1498
+ name: MTEB MedrxivClusteringP2P
1499
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1500
+ split: test
1501
+ type: mteb/medrxiv-clustering-p2p
1502
+ metrics:
1503
+ - type: v_measure
1504
+ value: 31.862573504920082
1505
+ task:
1506
+ type: Clustering
1507
+ - dataset:
1508
+ config: default
1509
+ name: MTEB MedrxivClusteringS2S
1510
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1511
+ split: test
1512
+ type: mteb/medrxiv-clustering-s2s
1513
+ metrics:
1514
+ - type: v_measure
1515
+ value: 27.511123551196803
1516
+ task:
1517
+ type: Clustering
1518
+ - dataset:
1519
+ config: default
1520
+ name: MTEB MindSmallReranking
1521
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1522
+ split: test
1523
+ type: mteb/mind_small
1524
+ metrics:
1525
+ - type: map
1526
+ value: 30.99145104942086
1527
+ - type: mrr
1528
+ value: 32.03606480418627
1529
+ task:
1530
+ type: Reranking
1531
+ - dataset:
1532
+ config: default
1533
+ name: MTEB NFCorpus
1534
+ revision: None
1535
+ split: test
1536
+ type: nfcorpus
1537
+ metrics:
1538
+ - type: map_at_1
1539
+ value: 5.015
1540
+ - type: map_at_10
1541
+ value: 11.054
1542
+ - type: map_at_100
1543
+ value: 13.773
1544
+ - type: map_at_1000
1545
+ value: 15.082999999999998
1546
+ - type: map_at_3
1547
+ value: 8.253
1548
+ - type: map_at_5
1549
+ value: 9.508999999999999
1550
+ - type: mrr_at_1
1551
+ value: 42.105
1552
+ - type: mrr_at_10
1553
+ value: 50.44499999999999
1554
+ - type: mrr_at_100
1555
+ value: 51.080000000000005
1556
+ - type: mrr_at_1000
1557
+ value: 51.129999999999995
1558
+ - type: mrr_at_3
1559
+ value: 48.555
1560
+ - type: mrr_at_5
1561
+ value: 49.84
1562
+ - type: ndcg_at_1
1563
+ value: 40.402
1564
+ - type: ndcg_at_10
1565
+ value: 30.403000000000002
1566
+ - type: ndcg_at_100
1567
+ value: 28.216
1568
+ - type: ndcg_at_1000
1569
+ value: 37.021
1570
+ - type: ndcg_at_3
1571
+ value: 35.53
1572
+ - type: ndcg_at_5
1573
+ value: 33.202999999999996
1574
+ - type: precision_at_1
1575
+ value: 42.105
1576
+ - type: precision_at_10
1577
+ value: 22.353
1578
+ - type: precision_at_100
1579
+ value: 7.266
1580
+ - type: precision_at_1000
1581
+ value: 2.011
1582
+ - type: precision_at_3
1583
+ value: 32.921
1584
+ - type: precision_at_5
1585
+ value: 28.297
1586
+ - type: recall_at_1
1587
+ value: 5.015
1588
+ - type: recall_at_10
1589
+ value: 14.393
1590
+ - type: recall_at_100
1591
+ value: 28.893
1592
+ - type: recall_at_1000
1593
+ value: 60.18
1594
+ - type: recall_at_3
1595
+ value: 9.184000000000001
1596
+ - type: recall_at_5
1597
+ value: 11.39
1598
+ task:
1599
+ type: Retrieval
1600
+ - dataset:
1601
+ config: default
1602
+ name: MTEB NQ
1603
+ revision: None
1604
+ split: test
1605
+ type: nq
1606
+ metrics:
1607
+ - type: map_at_1
1608
+ value: 29.524
1609
+ - type: map_at_10
1610
+ value: 44.182
1611
+ - type: map_at_100
1612
+ value: 45.228
1613
+ - type: map_at_1000
1614
+ value: 45.265
1615
+ - type: map_at_3
1616
+ value: 39.978
1617
+ - type: map_at_5
1618
+ value: 42.482
1619
+ - type: mrr_at_1
1620
+ value: 33.256
1621
+ - type: mrr_at_10
1622
+ value: 46.661
1623
+ - type: mrr_at_100
1624
+ value: 47.47
1625
+ - type: mrr_at_1000
1626
+ value: 47.496
1627
+ - type: mrr_at_3
1628
+ value: 43.187999999999995
1629
+ - type: mrr_at_5
1630
+ value: 45.330999999999996
1631
+ - type: ndcg_at_1
1632
+ value: 33.227000000000004
1633
+ - type: ndcg_at_10
1634
+ value: 51.589
1635
+ - type: ndcg_at_100
1636
+ value: 56.043
1637
+ - type: ndcg_at_1000
1638
+ value: 56.937000000000005
1639
+ - type: ndcg_at_3
1640
+ value: 43.751
1641
+ - type: ndcg_at_5
1642
+ value: 47.937000000000005
1643
+ - type: precision_at_1
1644
+ value: 33.227000000000004
1645
+ - type: precision_at_10
1646
+ value: 8.556999999999999
1647
+ - type: precision_at_100
1648
+ value: 1.103
1649
+ - type: precision_at_1000
1650
+ value: 0.11900000000000001
1651
+ - type: precision_at_3
1652
+ value: 19.921
1653
+ - type: precision_at_5
1654
+ value: 14.396999999999998
1655
+ - type: recall_at_1
1656
+ value: 29.524
1657
+ - type: recall_at_10
1658
+ value: 71.615
1659
+ - type: recall_at_100
1660
+ value: 91.056
1661
+ - type: recall_at_1000
1662
+ value: 97.72800000000001
1663
+ - type: recall_at_3
1664
+ value: 51.451
1665
+ - type: recall_at_5
1666
+ value: 61.119
1667
+ task:
1668
+ type: Retrieval
1669
+ - dataset:
1670
+ config: default
1671
+ name: MTEB QuoraRetrieval
1672
+ revision: None
1673
+ split: test
1674
+ type: quora
1675
+ metrics:
1676
+ - type: map_at_1
1677
+ value: 69.596
1678
+ - type: map_at_10
1679
+ value: 83.281
1680
+ - type: map_at_100
1681
+ value: 83.952
1682
+ - type: map_at_1000
1683
+ value: 83.97200000000001
1684
+ - type: map_at_3
1685
+ value: 80.315
1686
+ - type: map_at_5
1687
+ value: 82.223
1688
+ - type: mrr_at_1
1689
+ value: 80.17
1690
+ - type: mrr_at_10
1691
+ value: 86.522
1692
+ - type: mrr_at_100
1693
+ value: 86.644
1694
+ - type: mrr_at_1000
1695
+ value: 86.64500000000001
1696
+ - type: mrr_at_3
1697
+ value: 85.438
1698
+ - type: mrr_at_5
1699
+ value: 86.21799999999999
1700
+ - type: ndcg_at_1
1701
+ value: 80.19
1702
+ - type: ndcg_at_10
1703
+ value: 87.19
1704
+ - type: ndcg_at_100
1705
+ value: 88.567
1706
+ - type: ndcg_at_1000
1707
+ value: 88.70400000000001
1708
+ - type: ndcg_at_3
1709
+ value: 84.17999999999999
1710
+ - type: ndcg_at_5
1711
+ value: 85.931
1712
+ - type: precision_at_1
1713
+ value: 80.19
1714
+ - type: precision_at_10
1715
+ value: 13.209000000000001
1716
+ - type: precision_at_100
1717
+ value: 1.518
1718
+ - type: precision_at_1000
1719
+ value: 0.157
1720
+ - type: precision_at_3
1721
+ value: 36.717
1722
+ - type: precision_at_5
1723
+ value: 24.248
1724
+ - type: recall_at_1
1725
+ value: 69.596
1726
+ - type: recall_at_10
1727
+ value: 94.533
1728
+ - type: recall_at_100
1729
+ value: 99.322
1730
+ - type: recall_at_1000
1731
+ value: 99.965
1732
+ - type: recall_at_3
1733
+ value: 85.911
1734
+ - type: recall_at_5
1735
+ value: 90.809
1736
+ task:
1737
+ type: Retrieval
1738
+ - dataset:
1739
+ config: default
1740
+ name: MTEB RedditClustering
1741
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1742
+ split: test
1743
+ type: mteb/reddit-clustering
1744
+ metrics:
1745
+ - type: v_measure
1746
+ value: 49.27650627571912
1747
+ task:
1748
+ type: Clustering
1749
+ - dataset:
1750
+ config: default
1751
+ name: MTEB RedditClusteringP2P
1752
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1753
+ split: test
1754
+ type: mteb/reddit-clustering-p2p
1755
+ metrics:
1756
+ - type: v_measure
1757
+ value: 57.08550946534183
1758
+ task:
1759
+ type: Clustering
1760
+ - dataset:
1761
+ config: default
1762
+ name: MTEB SCIDOCS
1763
+ revision: None
1764
+ split: test
1765
+ type: scidocs
1766
+ metrics:
1767
+ - type: map_at_1
1768
+ value: 4.568
1769
+ - type: map_at_10
1770
+ value: 10.862
1771
+ - type: map_at_100
1772
+ value: 12.757
1773
+ - type: map_at_1000
1774
+ value: 13.031
1775
+ - type: map_at_3
1776
+ value: 7.960000000000001
1777
+ - type: map_at_5
1778
+ value: 9.337
1779
+ - type: mrr_at_1
1780
+ value: 22.5
1781
+ - type: mrr_at_10
1782
+ value: 32.6
1783
+ - type: mrr_at_100
1784
+ value: 33.603
1785
+ - type: mrr_at_1000
1786
+ value: 33.672000000000004
1787
+ - type: mrr_at_3
1788
+ value: 29.299999999999997
1789
+ - type: mrr_at_5
1790
+ value: 31.25
1791
+ - type: ndcg_at_1
1792
+ value: 22.5
1793
+ - type: ndcg_at_10
1794
+ value: 18.605
1795
+ - type: ndcg_at_100
1796
+ value: 26.029999999999998
1797
+ - type: ndcg_at_1000
1798
+ value: 31.256
1799
+ - type: ndcg_at_3
1800
+ value: 17.873
1801
+ - type: ndcg_at_5
1802
+ value: 15.511
1803
+ - type: precision_at_1
1804
+ value: 22.5
1805
+ - type: precision_at_10
1806
+ value: 9.58
1807
+ - type: precision_at_100
1808
+ value: 2.033
1809
+ - type: precision_at_1000
1810
+ value: 0.33
1811
+ - type: precision_at_3
1812
+ value: 16.633
1813
+ - type: precision_at_5
1814
+ value: 13.54
1815
+ - type: recall_at_1
1816
+ value: 4.568
1817
+ - type: recall_at_10
1818
+ value: 19.402
1819
+ - type: recall_at_100
1820
+ value: 41.277
1821
+ - type: recall_at_1000
1822
+ value: 66.963
1823
+ - type: recall_at_3
1824
+ value: 10.112
1825
+ - type: recall_at_5
1826
+ value: 13.712
1827
+ task:
1828
+ type: Retrieval
1829
+ - dataset:
1830
+ config: default
1831
+ name: MTEB SICK-R
1832
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1833
+ split: test
1834
+ type: mteb/sickr-sts
1835
+ metrics:
1836
+ - type: cos_sim_pearson
1837
+ value: 83.31992291680787
1838
+ - type: cos_sim_spearman
1839
+ value: 76.7212346922664
1840
+ - type: euclidean_pearson
1841
+ value: 80.42189271706478
1842
+ - type: euclidean_spearman
1843
+ value: 76.7212342532493
1844
+ - type: manhattan_pearson
1845
+ value: 80.33171093031578
1846
+ - type: manhattan_spearman
1847
+ value: 76.63192883074694
1848
+ task:
1849
+ type: STS
1850
+ - dataset:
1851
+ config: default
1852
+ name: MTEB STS12
1853
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1854
+ split: test
1855
+ type: mteb/sts12-sts
1856
+ metrics:
1857
+ - type: cos_sim_pearson
1858
+ value: 83.16654278886763
1859
+ - type: cos_sim_spearman
1860
+ value: 73.66390263429565
1861
+ - type: euclidean_pearson
1862
+ value: 79.7485360086639
1863
+ - type: euclidean_spearman
1864
+ value: 73.66389870373436
1865
+ - type: manhattan_pearson
1866
+ value: 79.73652237443706
1867
+ - type: manhattan_spearman
1868
+ value: 73.65296117151647
1869
+ task:
1870
+ type: STS
1871
+ - dataset:
1872
+ config: default
1873
+ name: MTEB STS13
1874
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1875
+ split: test
1876
+ type: mteb/sts13-sts
1877
+ metrics:
1878
+ - type: cos_sim_pearson
1879
+ value: 82.40389689929246
1880
+ - type: cos_sim_spearman
1881
+ value: 83.29727595993955
1882
+ - type: euclidean_pearson
1883
+ value: 82.23970587854079
1884
+ - type: euclidean_spearman
1885
+ value: 83.29727595993955
1886
+ - type: manhattan_pearson
1887
+ value: 82.18823600831897
1888
+ - type: manhattan_spearman
1889
+ value: 83.20746192209594
1890
+ task:
1891
+ type: STS
1892
+ - dataset:
1893
+ config: default
1894
+ name: MTEB STS14
1895
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
1896
+ split: test
1897
+ type: mteb/sts14-sts
1898
+ metrics:
1899
+ - type: cos_sim_pearson
1900
+ value: 81.73505246913413
1901
+ - type: cos_sim_spearman
1902
+ value: 79.1686548248754
1903
+ - type: euclidean_pearson
1904
+ value: 80.48889135993412
1905
+ - type: euclidean_spearman
1906
+ value: 79.16864112930354
1907
+ - type: manhattan_pearson
1908
+ value: 80.40720651057302
1909
+ - type: manhattan_spearman
1910
+ value: 79.0640155089286
1911
+ task:
1912
+ type: STS
1913
+ - dataset:
1914
+ config: default
1915
+ name: MTEB STS15
1916
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
1917
+ split: test
1918
+ type: mteb/sts15-sts
1919
+ metrics:
1920
+ - type: cos_sim_pearson
1921
+ value: 86.3953512879065
1922
+ - type: cos_sim_spearman
1923
+ value: 87.29947322714338
1924
+ - type: euclidean_pearson
1925
+ value: 86.59759438529645
1926
+ - type: euclidean_spearman
1927
+ value: 87.29947511092824
1928
+ - type: manhattan_pearson
1929
+ value: 86.52097806169155
1930
+ - type: manhattan_spearman
1931
+ value: 87.22987242146534
1932
+ task:
1933
+ type: STS
1934
+ - dataset:
1935
+ config: default
1936
+ name: MTEB STS16
1937
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
1938
+ split: test
1939
+ type: mteb/sts16-sts
1940
+ metrics:
1941
+ - type: cos_sim_pearson
1942
+ value: 82.48565753792056
1943
+ - type: cos_sim_spearman
1944
+ value: 83.6049720319893
1945
+ - type: euclidean_pearson
1946
+ value: 82.56452023172913
1947
+ - type: euclidean_spearman
1948
+ value: 83.60490168191697
1949
+ - type: manhattan_pearson
1950
+ value: 82.58079941137872
1951
+ - type: manhattan_spearman
1952
+ value: 83.60975807374051
1953
+ task:
1954
+ type: STS
1955
+ - dataset:
1956
+ config: en-en
1957
+ name: MTEB STS17 (en-en)
1958
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
1959
+ split: test
1960
+ type: mteb/sts17-crosslingual-sts
1961
+ metrics:
1962
+ - type: cos_sim_pearson
1963
+ value: 88.18239976618212
1964
+ - type: cos_sim_spearman
1965
+ value: 88.23061724730616
1966
+ - type: euclidean_pearson
1967
+ value: 87.78482472776658
1968
+ - type: euclidean_spearman
1969
+ value: 88.23061724730616
1970
+ - type: manhattan_pearson
1971
+ value: 87.75059641730239
1972
+ - type: manhattan_spearman
1973
+ value: 88.22527413524622
1974
+ task:
1975
+ type: STS
1976
+ - dataset:
1977
+ config: en
1978
+ name: MTEB STS22 (en)
1979
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
1980
+ split: test
1981
+ type: mteb/sts22-crosslingual-sts
1982
+ metrics:
1983
+ - type: cos_sim_pearson
1984
+ value: 63.42816418706765
1985
+ - type: cos_sim_spearman
1986
+ value: 63.4569864520124
1987
+ - type: euclidean_pearson
1988
+ value: 64.35405409953853
1989
+ - type: euclidean_spearman
1990
+ value: 63.4569864520124
1991
+ - type: manhattan_pearson
1992
+ value: 63.96649236073056
1993
+ - type: manhattan_spearman
1994
+ value: 63.01448583722708
1995
+ task:
1996
+ type: STS
1997
+ - dataset:
1998
+ config: default
1999
+ name: MTEB STSBenchmark
2000
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2001
+ split: test
2002
+ type: mteb/stsbenchmark-sts
2003
+ metrics:
2004
+ - type: cos_sim_pearson
2005
+ value: 83.41659638047614
2006
+ - type: cos_sim_spearman
2007
+ value: 84.03893866106175
2008
+ - type: euclidean_pearson
2009
+ value: 84.2251203953798
2010
+ - type: euclidean_spearman
2011
+ value: 84.03893866106175
2012
+ - type: manhattan_pearson
2013
+ value: 84.22733643205514
2014
+ - type: manhattan_spearman
2015
+ value: 84.06504411263612
2016
+ task:
2017
+ type: STS
2018
+ - dataset:
2019
+ config: default
2020
+ name: MTEB SciDocsRR
2021
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2022
+ split: test
2023
+ type: mteb/scidocs-reranking
2024
+ metrics:
2025
+ - type: map
2026
+ value: 79.75608022582414
2027
+ - type: mrr
2028
+ value: 94.0947732369301
2029
+ task:
2030
+ type: Reranking
2031
+ - dataset:
2032
+ config: default
2033
+ name: MTEB SciFact
2034
+ revision: None
2035
+ split: test
2036
+ type: scifact
2037
+ metrics:
2038
+ - type: map_at_1
2039
+ value: 50.161
2040
+ - type: map_at_10
2041
+ value: 59.458999999999996
2042
+ - type: map_at_100
2043
+ value: 60.156
2044
+ - type: map_at_1000
2045
+ value: 60.194
2046
+ - type: map_at_3
2047
+ value: 56.45400000000001
2048
+ - type: map_at_5
2049
+ value: 58.165
2050
+ - type: mrr_at_1
2051
+ value: 53.333
2052
+ - type: mrr_at_10
2053
+ value: 61.050000000000004
2054
+ - type: mrr_at_100
2055
+ value: 61.586
2056
+ - type: mrr_at_1000
2057
+ value: 61.624
2058
+ - type: mrr_at_3
2059
+ value: 58.889
2060
+ - type: mrr_at_5
2061
+ value: 60.122
2062
+ - type: ndcg_at_1
2063
+ value: 53.333
2064
+ - type: ndcg_at_10
2065
+ value: 63.888999999999996
2066
+ - type: ndcg_at_100
2067
+ value: 66.963
2068
+ - type: ndcg_at_1000
2069
+ value: 68.062
2070
+ - type: ndcg_at_3
2071
+ value: 59.01
2072
+ - type: ndcg_at_5
2073
+ value: 61.373999999999995
2074
+ - type: precision_at_1
2075
+ value: 53.333
2076
+ - type: precision_at_10
2077
+ value: 8.633000000000001
2078
+ - type: precision_at_100
2079
+ value: 1.027
2080
+ - type: precision_at_1000
2081
+ value: 0.11199999999999999
2082
+ - type: precision_at_3
2083
+ value: 23.111
2084
+ - type: precision_at_5
2085
+ value: 15.467
2086
+ - type: recall_at_1
2087
+ value: 50.161
2088
+ - type: recall_at_10
2089
+ value: 75.922
2090
+ - type: recall_at_100
2091
+ value: 90.0
2092
+ - type: recall_at_1000
2093
+ value: 98.667
2094
+ - type: recall_at_3
2095
+ value: 62.90599999999999
2096
+ - type: recall_at_5
2097
+ value: 68.828
2098
+ task:
2099
+ type: Retrieval
2100
+ - dataset:
2101
+ config: default
2102
+ name: MTEB SprintDuplicateQuestions
2103
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2104
+ split: test
2105
+ type: mteb/sprintduplicatequestions-pairclassification
2106
+ metrics:
2107
+ - type: cos_sim_accuracy
2108
+ value: 99.81188118811882
2109
+ - type: cos_sim_ap
2110
+ value: 95.11619225962413
2111
+ - type: cos_sim_f1
2112
+ value: 90.35840484603736
2113
+ - type: cos_sim_precision
2114
+ value: 91.23343527013252
2115
+ - type: cos_sim_recall
2116
+ value: 89.5
2117
+ - type: dot_accuracy
2118
+ value: 99.81188118811882
2119
+ - type: dot_ap
2120
+ value: 95.11619225962413
2121
+ - type: dot_f1
2122
+ value: 90.35840484603736
2123
+ - type: dot_precision
2124
+ value: 91.23343527013252
2125
+ - type: dot_recall
2126
+ value: 89.5
2127
+ - type: euclidean_accuracy
2128
+ value: 99.81188118811882
2129
+ - type: euclidean_ap
2130
+ value: 95.11619225962413
2131
+ - type: euclidean_f1
2132
+ value: 90.35840484603736
2133
+ - type: euclidean_precision
2134
+ value: 91.23343527013252
2135
+ - type: euclidean_recall
2136
+ value: 89.5
2137
+ - type: manhattan_accuracy
2138
+ value: 99.80891089108911
2139
+ - type: manhattan_ap
2140
+ value: 95.07294266220966
2141
+ - type: manhattan_f1
2142
+ value: 90.21794221996959
2143
+ - type: manhattan_precision
2144
+ value: 91.46968139773895
2145
+ - type: manhattan_recall
2146
+ value: 89.0
2147
+ - type: max_accuracy
2148
+ value: 99.81188118811882
2149
+ - type: max_ap
2150
+ value: 95.11619225962413
2151
+ - type: max_f1
2152
+ value: 90.35840484603736
2153
+ task:
2154
+ type: PairClassification
2155
+ - dataset:
2156
+ config: default
2157
+ name: MTEB StackExchangeClustering
2158
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2159
+ split: test
2160
+ type: mteb/stackexchange-clustering
2161
+ metrics:
2162
+ - type: v_measure
2163
+ value: 55.3481874105239
2164
+ task:
2165
+ type: Clustering
2166
+ - dataset:
2167
+ config: default
2168
+ name: MTEB StackExchangeClusteringP2P
2169
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2170
+ split: test
2171
+ type: mteb/stackexchange-clustering-p2p
2172
+ metrics:
2173
+ - type: v_measure
2174
+ value: 34.421291695525
2175
+ task:
2176
+ type: Clustering
2177
+ - dataset:
2178
+ config: default
2179
+ name: MTEB StackOverflowDupQuestions
2180
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2181
+ split: test
2182
+ type: mteb/stackoverflowdupquestions-reranking
2183
+ metrics:
2184
+ - type: map
2185
+ value: 49.98746633276634
2186
+ - type: mrr
2187
+ value: 50.63143249724133
2188
+ task:
2189
+ type: Reranking
2190
+ - dataset:
2191
+ config: default
2192
+ name: MTEB SummEval
2193
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2194
+ split: test
2195
+ type: mteb/summeval
2196
+ metrics:
2197
+ - type: cos_sim_pearson
2198
+ value: 31.009961979844036
2199
+ - type: cos_sim_spearman
2200
+ value: 30.558416108881044
2201
+ - type: dot_pearson
2202
+ value: 31.009964941134253
2203
+ - type: dot_spearman
2204
+ value: 30.545760761761393
2205
+ task:
2206
+ type: Summarization
2207
+ - dataset:
2208
+ config: default
2209
+ name: MTEB TRECCOVID
2210
+ revision: None
2211
+ split: test
2212
+ type: trec-covid
2213
+ metrics:
2214
+ - type: map_at_1
2215
+ value: 0.207
2216
+ - type: map_at_10
2217
+ value: 1.6
2218
+ - type: map_at_100
2219
+ value: 8.594
2220
+ - type: map_at_1000
2221
+ value: 20.213
2222
+ - type: map_at_3
2223
+ value: 0.585
2224
+ - type: map_at_5
2225
+ value: 0.9039999999999999
2226
+ - type: mrr_at_1
2227
+ value: 78.0
2228
+ - type: mrr_at_10
2229
+ value: 87.4
2230
+ - type: mrr_at_100
2231
+ value: 87.4
2232
+ - type: mrr_at_1000
2233
+ value: 87.4
2234
+ - type: mrr_at_3
2235
+ value: 86.667
2236
+ - type: mrr_at_5
2237
+ value: 87.06700000000001
2238
+ - type: ndcg_at_1
2239
+ value: 73.0
2240
+ - type: ndcg_at_10
2241
+ value: 65.18
2242
+ - type: ndcg_at_100
2243
+ value: 49.631
2244
+ - type: ndcg_at_1000
2245
+ value: 43.498999999999995
2246
+ - type: ndcg_at_3
2247
+ value: 71.83800000000001
2248
+ - type: ndcg_at_5
2249
+ value: 69.271
2250
+ - type: precision_at_1
2251
+ value: 78.0
2252
+ - type: precision_at_10
2253
+ value: 69.19999999999999
2254
+ - type: precision_at_100
2255
+ value: 50.980000000000004
2256
+ - type: precision_at_1000
2257
+ value: 19.426
2258
+ - type: precision_at_3
2259
+ value: 77.333
2260
+ - type: precision_at_5
2261
+ value: 74.0
2262
+ - type: recall_at_1
2263
+ value: 0.207
2264
+ - type: recall_at_10
2265
+ value: 1.822
2266
+ - type: recall_at_100
2267
+ value: 11.849
2268
+ - type: recall_at_1000
2269
+ value: 40.492
2270
+ - type: recall_at_3
2271
+ value: 0.622
2272
+ - type: recall_at_5
2273
+ value: 0.9809999999999999
2274
+ task:
2275
+ type: Retrieval
2276
+ - dataset:
2277
+ config: default
2278
+ name: MTEB Touche2020
2279
+ revision: None
2280
+ split: test
2281
+ type: webis-touche2020
2282
+ metrics:
2283
+ - type: map_at_1
2284
+ value: 2.001
2285
+ - type: map_at_10
2286
+ value: 10.376000000000001
2287
+ - type: map_at_100
2288
+ value: 16.936999999999998
2289
+ - type: map_at_1000
2290
+ value: 18.615000000000002
2291
+ - type: map_at_3
2292
+ value: 5.335999999999999
2293
+ - type: map_at_5
2294
+ value: 7.374
2295
+ - type: mrr_at_1
2296
+ value: 20.408
2297
+ - type: mrr_at_10
2298
+ value: 38.29
2299
+ - type: mrr_at_100
2300
+ value: 39.33
2301
+ - type: mrr_at_1000
2302
+ value: 39.347
2303
+ - type: mrr_at_3
2304
+ value: 32.993
2305
+ - type: mrr_at_5
2306
+ value: 36.973
2307
+ - type: ndcg_at_1
2308
+ value: 17.347
2309
+ - type: ndcg_at_10
2310
+ value: 23.515
2311
+ - type: ndcg_at_100
2312
+ value: 37.457
2313
+ - type: ndcg_at_1000
2314
+ value: 49.439
2315
+ - type: ndcg_at_3
2316
+ value: 22.762999999999998
2317
+ - type: ndcg_at_5
2318
+ value: 22.622
2319
+ - type: precision_at_1
2320
+ value: 20.408
2321
+ - type: precision_at_10
2322
+ value: 22.448999999999998
2323
+ - type: precision_at_100
2324
+ value: 8.184
2325
+ - type: precision_at_1000
2326
+ value: 1.608
2327
+ - type: precision_at_3
2328
+ value: 25.85
2329
+ - type: precision_at_5
2330
+ value: 25.306
2331
+ - type: recall_at_1
2332
+ value: 2.001
2333
+ - type: recall_at_10
2334
+ value: 17.422
2335
+ - type: recall_at_100
2336
+ value: 51.532999999999994
2337
+ - type: recall_at_1000
2338
+ value: 87.466
2339
+ - type: recall_at_3
2340
+ value: 6.861000000000001
2341
+ - type: recall_at_5
2342
+ value: 10.502
2343
+ task:
2344
+ type: Retrieval
2345
+ - dataset:
2346
+ config: default
2347
+ name: MTEB ToxicConversationsClassification
2348
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2349
+ split: test
2350
+ type: mteb/toxic_conversations_50k
2351
+ metrics:
2352
+ - type: accuracy
2353
+ value: 71.54419999999999
2354
+ - type: ap
2355
+ value: 14.372170450843907
2356
+ - type: f1
2357
+ value: 54.94420257390529
2358
+ task:
2359
+ type: Classification
2360
+ - dataset:
2361
+ config: default
2362
+ name: MTEB TweetSentimentExtractionClassification
2363
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2364
+ split: test
2365
+ type: mteb/tweet_sentiment_extraction
2366
+ metrics:
2367
+ - type: accuracy
2368
+ value: 59.402942840973395
2369
+ - type: f1
2370
+ value: 59.4166538875571
2371
+ task:
2372
+ type: Classification
2373
+ - dataset:
2374
+ config: default
2375
+ name: MTEB TwentyNewsgroupsClustering
2376
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2377
+ split: test
2378
+ type: mteb/twentynewsgroups-clustering
2379
+ metrics:
2380
+ - type: v_measure
2381
+ value: 41.569064336457906
2382
+ task:
2383
+ type: Clustering
2384
+ - dataset:
2385
+ config: default
2386
+ name: MTEB TwitterSemEval2015
2387
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2388
+ split: test
2389
+ type: mteb/twittersemeval2015-pairclassification
2390
+ metrics:
2391
+ - type: cos_sim_accuracy
2392
+ value: 85.31322644096085
2393
+ - type: cos_sim_ap
2394
+ value: 72.14518894837381
2395
+ - type: cos_sim_f1
2396
+ value: 66.67489813557229
2397
+ - type: cos_sim_precision
2398
+ value: 62.65954977953121
2399
+ - type: cos_sim_recall
2400
+ value: 71.2401055408971
2401
+ - type: dot_accuracy
2402
+ value: 85.31322644096085
2403
+ - type: dot_ap
2404
+ value: 72.14521480685293
2405
+ - type: dot_f1
2406
+ value: 66.67489813557229
2407
+ - type: dot_precision
2408
+ value: 62.65954977953121
2409
+ - type: dot_recall
2410
+ value: 71.2401055408971
2411
+ - type: euclidean_accuracy
2412
+ value: 85.31322644096085
2413
+ - type: euclidean_ap
2414
+ value: 72.14520820485349
2415
+ - type: euclidean_f1
2416
+ value: 66.67489813557229
2417
+ - type: euclidean_precision
2418
+ value: 62.65954977953121
2419
+ - type: euclidean_recall
2420
+ value: 71.2401055408971
2421
+ - type: manhattan_accuracy
2422
+ value: 85.21785778148656
2423
+ - type: manhattan_ap
2424
+ value: 72.01177147657364
2425
+ - type: manhattan_f1
2426
+ value: 66.62594673833374
2427
+ - type: manhattan_precision
2428
+ value: 62.0336669699727
2429
+ - type: manhattan_recall
2430
+ value: 71.95250659630607
2431
+ - type: max_accuracy
2432
+ value: 85.31322644096085
2433
+ - type: max_ap
2434
+ value: 72.14521480685293
2435
+ - type: max_f1
2436
+ value: 66.67489813557229
2437
+ task:
2438
+ type: PairClassification
2439
+ - dataset:
2440
+ config: default
2441
+ name: MTEB TwitterURLCorpus
2442
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2443
+ split: test
2444
+ type: mteb/twitterurlcorpus-pairclassification
2445
+ metrics:
2446
+ - type: cos_sim_accuracy
2447
+ value: 89.12756626693057
2448
+ - type: cos_sim_ap
2449
+ value: 86.05430786440826
2450
+ - type: cos_sim_f1
2451
+ value: 78.27759692216631
2452
+ - type: cos_sim_precision
2453
+ value: 75.33466248931929
2454
+ - type: cos_sim_recall
2455
+ value: 81.45980905451185
2456
+ - type: dot_accuracy
2457
+ value: 89.12950673341872
2458
+ - type: dot_ap
2459
+ value: 86.05431161145492
2460
+ - type: dot_f1
2461
+ value: 78.27759692216631
2462
+ - type: dot_precision
2463
+ value: 75.33466248931929
2464
+ - type: dot_recall
2465
+ value: 81.45980905451185
2466
+ - type: euclidean_accuracy
2467
+ value: 89.12756626693057
2468
+ - type: euclidean_ap
2469
+ value: 86.05431303247397
2470
+ - type: euclidean_f1
2471
+ value: 78.27759692216631
2472
+ - type: euclidean_precision
2473
+ value: 75.33466248931929
2474
+ - type: euclidean_recall
2475
+ value: 81.45980905451185
2476
+ - type: manhattan_accuracy
2477
+ value: 89.04994760740482
2478
+ - type: manhattan_ap
2479
+ value: 86.00860610892074
2480
+ - type: manhattan_f1
2481
+ value: 78.1846776005392
2482
+ - type: manhattan_precision
2483
+ value: 76.10438839480975
2484
+ - type: manhattan_recall
2485
+ value: 80.3818909762858
2486
+ - type: max_accuracy
2487
+ value: 89.12950673341872
2488
+ - type: max_ap
2489
+ value: 86.05431303247397
2490
+ - type: max_f1
2491
+ value: 78.27759692216631
2492
+ task:
2493
+ type: PairClassification
2494
+ tags:
2495
+ - sentence-transformers
2496
+ - feature-extraction
2497
+ - sentence-similarity
2498
+ - mteb
2499
+ - onnx
2500
+ - teradata
2501
+
2502
+ ---
2503
+ # A Teradata Vantage compatible Embeddings Model
2504
+
2505
+ # jinaai/jina-embeddings-v2-small-en
2506
+
2507
+ ## Overview of this Model
2508
+
2509
+ An Embedding Model which maps text (sentence/ paragraphs) into a vector. The [jinaai/jina-embeddings-v2-small-en](https://huggingface.co/jinaai/jina-embeddings-v2-small-en) model well known for its effectiveness in capturing semantic meanings in text data. It's a state-of-the-art model trained on a large corpus, capable of generating high-quality text embeddings.
2510
+
2511
+ - 32.69M params (Sizes in ONNX format - "fp32": 123.8MB, "int8": 31.14MB, "uint8": 31.14MB)
2512
+ - 8192 maximum input tokens
2513
+ - 512 dimensions of output vector
2514
+ - Licence: apache-2.0. The released models can be used for commercial purposes free of charge.
2515
+ - Reference to Original Model: https://huggingface.co/jinaai/jina-embeddings-v2-small-en
2516
+
2517
+
2518
+ ## Quickstart: Deploying this Model in Teradata Vantage
2519
+
2520
+ We have pre-converted the model into the ONNX format compatible with BYOM 6.0, eliminating the need for manual conversion.
2521
+
2522
+ **Note:** Ensure you have access to a Teradata Database with BYOM 6.0 installed.
2523
+
2524
+ To get started, clone the pre-converted model directly from the Teradata HuggingFace repository.
2525
+
2526
+
2527
+ ```python
2528
+
2529
+ import teradataml as tdml
2530
+ import getpass
2531
+ from huggingface_hub import hf_hub_download
2532
+
2533
+ model_name = "jina-embeddings-v2-small-en"
2534
+ number_dimensions_output = 512
2535
+ model_file_name = "model.onnx"
2536
+
2537
+ # Step 1: Download Model from Teradata HuggingFace Page
2538
+
2539
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"onnx/{model_file_name}", local_dir="./")
2540
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"tokenizer.json", local_dir="./")
2541
+
2542
+ # Step 2: Create Connection to Vantage
2543
+
2544
+ tdml.create_context(host = input('enter your hostname'),
2545
+ username=input('enter your username'),
2546
+ password = getpass.getpass("enter your password"))
2547
+
2548
+ # Step 3: Load Models into Vantage
2549
+ # a) Embedding model
2550
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2551
+ model_file = model_file_name,
2552
+ table_name = 'embeddings_models' )
2553
+ # b) Tokenizer
2554
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2555
+ model_file = 'tokenizer.json',
2556
+ table_name = 'embeddings_tokenizers')
2557
+
2558
+ # Step 4: Test ONNXEmbeddings Function
2559
+ # Note that ONNXEmbeddings expects the 'payload' column to be 'txt'.
2560
+ # If it has got a different name, just rename it in a subquery/CTE.
2561
+ input_table = "emails.emails"
2562
+ embeddings_query = f"""
2563
+ SELECT
2564
+ *
2565
+ from mldb.ONNXEmbeddings(
2566
+ on {input_table} as InputTable
2567
+ on (select * from embeddings_models where model_id = '{model_name}') as ModelTable DIMENSION
2568
+ on (select model as tokenizer from embeddings_tokenizers where model_id = '{model_name}') as TokenizerTable DIMENSION
2569
+ using
2570
+ Accumulate('id', 'txt')
2571
+ ModelOutputTensor('sentence_embedding')
2572
+ EnableMemoryCheck('false')
2573
+ OutputFormat('FLOAT32({number_dimensions_output})')
2574
+ OverwriteCachedModel('true')
2575
+ ) a
2576
+ """
2577
+ DF_embeddings = tdml.DataFrame.from_query(embeddings_query)
2578
+ DF_embeddings
2579
+ ```
2580
+
2581
+
2582
+
2583
+ ## What Can I Do with the Embeddings?
2584
+
2585
+ Teradata Vantage includes pre-built in-database functions to process embeddings further. Explore the following examples:
2586
+
2587
+ - **Semantic Clustering with TD_KMeans:** [Semantic Clustering Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Clustering_Python.ipynb)
2588
+ - **Semantic Distance with TD_VectorDistance:** [Semantic Similarity Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Similarity_Python.ipynb)
2589
+ - **RAG-Based Application with TD_VectorDistance:** [RAG and Bedrock Query PDF Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/RAG_and_Bedrock_QueryPDF.ipynb)
2590
+
2591
+
2592
+ ## Deep Dive into Model Conversion to ONNX
2593
+
2594
+ **The steps below outline how we converted the open-source Hugging Face model into an ONNX file compatible with the in-database ONNXEmbeddings function.**
2595
+
2596
+ You do not need to perform these steps—they are provided solely for documentation and transparency. However, they may be helpful if you wish to convert another model to the required format.
2597
+
2598
+
2599
+ ### Part 1. Importing and Converting Model using optimum
2600
+
2601
+ We start by importing the pre-trained [jinaai/jina-embeddings-v2-small-en](https://huggingface.co/jinaai/jina-embeddings-v2-small-en) model from Hugging Face.
2602
+
2603
+ We are downloading the ONNX files from the repository prepared by the model authors.
2604
+
2605
+ After downloading, we are fixing the opset in the ONNX file for compatibility with ONNX runtime used in Teradata Vantage
2606
+
2607
+ Also we adding the man pooling and normalization layers to the ONNX file
2608
+
2609
+ We are generating ONNX files for multiple different precisions: fp32, int8, uint8
2610
+
2611
+ You can find the detailed conversion steps in the file [convert.py](./convert.py)
2612
+
2613
+ ### Part 2. Running the model in Python with onnxruntime & compare results
2614
+
2615
+ Once the fixes are applied, we proceed to test the correctness of the ONNX model by calculating cosine similarity between two texts using native SentenceTransformers and ONNX runtime, comparing the results.
2616
+
2617
+ If the results are identical, it confirms that the ONNX model gives the same result as the native models, validating its correctness and suitability for further use in the database.
2618
+
2619
+
2620
+ ```python
2621
+ import onnxruntime as rt
2622
+
2623
+ from sentence_transformers.util import cos_sim
2624
+ from sentence_transformers import SentenceTransformer
2625
+
2626
+ import transformers
2627
+
2628
+
2629
+ sentences_1 = 'How is the weather today?'
2630
+ sentences_2 = 'What is the current weather like today?'
2631
+
2632
+ # Calculate ONNX result
2633
+ tokenizer = transformers.AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v2-small-en")
2634
+ predef_sess = rt.InferenceSession("onnx/model.onnx")
2635
+
2636
+ enc1 = tokenizer(sentences_1)
2637
+ embeddings_1_onnx = predef_sess.run(None, {"input_ids": [enc1.input_ids],
2638
+ "attention_mask": [enc1.attention_mask]})
2639
+
2640
+ enc2 = tokenizer(sentences_2)
2641
+ embeddings_2_onnx = predef_sess.run(None, {"input_ids": [enc2.input_ids],
2642
+ "attention_mask": [enc2.attention_mask]})
2643
+
2644
+
2645
+ # Calculate embeddings with SentenceTransformer
2646
+ model = SentenceTransformer(model_id, trust_remote_code=True)
2647
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
2648
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
2649
+
2650
+ # Compare results
2651
+ print("Cosine similiarity for embeddings calculated with ONNX:" + str(cos_sim(embeddings_1_onnx[1][0], embeddings_2_onnx[1][0])))
2652
+ print("Cosine similiarity for embeddings calculated with SentenceTransformer:" + str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
2653
+ ```
2654
+
2655
+ You can find the detailed ONNX vs. SentenceTransformer result comparison steps in the file [test_local.py](./test_local.py)
2656
+
config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "_name_or_path": "jinaai/jina-embeddings-v2-small-en",
4
+ "architectures": [
5
+ "JinaBertForMaskedLM"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.0,
8
+ "attn_implementation": null,
9
+ "auto_map": {
10
+ "AutoConfig": "jinaai/jina-bert-implementation--configuration_bert.JinaBertConfig",
11
+ "AutoModel": "jinaai/jina-bert-implementation--modeling_bert.JinaBertModel",
12
+ "AutoModelForMaskedLM": "jinaai/jina-bert-implementation--modeling_bert.JinaBertForMaskedLM",
13
+ "AutoModelForSequenceClassification": "jinaai/jina-bert-implementation--modeling_bert.JinaBertForSequenceClassification"
14
+ },
15
+ "classifier_dropout": null,
16
+ "emb_pooler": "mean",
17
+ "export_model_type": "transformer",
18
+ "feed_forward_type": "geglu",
19
+ "gradient_checkpointing": false,
20
+ "hidden_act": "gelu",
21
+ "hidden_dropout_prob": 0.1,
22
+ "hidden_size": 512,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 2048,
25
+ "layer_norm_eps": 1e-12,
26
+ "max_position_embeddings": 8192,
27
+ "model_max_length": 8192,
28
+ "model_type": "bert",
29
+ "num_attention_heads": 8,
30
+ "num_hidden_layers": 4,
31
+ "pad_token_id": 0,
32
+ "position_embedding_type": "alibi",
33
+ "torch_dtype": "float32",
34
+ "transformers_version": "4.47.1",
35
+ "type_vocab_size": 2,
36
+ "use_cache": true,
37
+ "vocab_size": 30528
38
+ }
conversion_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_id": "jinaai/jina-embeddings-v2-small-en",
3
+ "number_of_generated_embeddings": 512,
4
+ "precision_to_filename_map": {
5
+ "fp32": "onnx/model.onnx",
6
+ "int8": "onnx/model_int8.onnx",
7
+ "uint8": "onnx/model_uint8.onnx"
8
+
9
+ },
10
+ "opset": 16,
11
+ "IR": 8
12
+ }
convert.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import shutil
4
+
5
+ from optimum.exporters.onnx import main_export
6
+ import onnx
7
+ from onnxconverter_common import float16
8
+ import onnxruntime as rt
9
+ from onnxruntime.tools.onnx_model_utils import *
10
+ from onnxruntime.quantization import quantize_dynamic, QuantType
11
+ import huggingface_hub
12
+
13
+ def add_mean_pooling(input_model, output_model, op, IR, output_embeddings_number):
14
+ model = onnx.load(input_model)
15
+ model_ir8 = onnx.helper.make_model(model.graph, ir_version = IR, opset_imports = [op]) #to be sure that we have compatible opset and IR version
16
+
17
+ minus_one_axis = onnx.helper.make_tensor(
18
+ name = "minus_one_axis",
19
+ data_type = onnx.TensorProto.INT64,
20
+ dims = [1],
21
+ vals = [-1])
22
+
23
+ model_ir8.graph.initializer.append(minus_one_axis)
24
+
25
+ mask_clip_lower_limit = onnx.helper.make_tensor(
26
+ name = "mask_clip_lower_limit",
27
+ data_type = onnx.TensorProto.FLOAT,
28
+ dims = [1],
29
+ vals = [1e-9])
30
+
31
+ model_ir8.graph.initializer.append(mask_clip_lower_limit)
32
+
33
+ sum_one_axis = onnx.helper.make_tensor(
34
+ name = "sum_one_axis",
35
+ data_type = onnx.TensorProto.INT64,
36
+ dims = [1],
37
+ vals = [1])
38
+
39
+ model_ir8.graph.initializer.append(sum_one_axis)
40
+
41
+ attention_mask_cast_op = onnx.helper.make_node(
42
+ "Cast",
43
+ inputs=["attention_mask"],
44
+ outputs=["attention_mask_fp32"],
45
+ to=onnx.TensorProto.FLOAT
46
+ )
47
+
48
+ model_ir8.graph.node.append(attention_mask_cast_op)
49
+
50
+ expand_dims_op = onnx.helper.make_node(
51
+ "Unsqueeze",
52
+ inputs=["attention_mask_fp32", "minus_one_axis"],
53
+ outputs=["unsqueezed_attention_mask"],
54
+ )
55
+
56
+ model_ir8.graph.node.append(expand_dims_op)
57
+
58
+ shape_op = onnx.helper.make_node(
59
+ "Shape",
60
+ inputs = ["last_hidden_state"],
61
+ outputs = ["last_hidden_state_shape"]
62
+ )
63
+
64
+ model_ir8.graph.node.append(shape_op)
65
+
66
+ broadcast_to_op = onnx.helper.make_node(
67
+ "Expand",
68
+ inputs=["unsqueezed_attention_mask", "last_hidden_state_shape"],
69
+ outputs=["expanded_attention_mask"],
70
+ )
71
+
72
+ model_ir8.graph.node.append(broadcast_to_op)
73
+
74
+ multiply_op = onnx.helper.make_node(
75
+ "Mul",
76
+ inputs=["last_hidden_state", "expanded_attention_mask"],
77
+ outputs=["last_hidden_state_x_expanded_attention_mask"],
78
+ )
79
+
80
+ model_ir8.graph.node.append(multiply_op)
81
+
82
+ sum_embeddings_op = onnx.helper.make_node(
83
+ "ReduceSum",
84
+ inputs=["last_hidden_state_x_expanded_attention_mask", "sum_one_axis"],
85
+ outputs=["sum_last_hidden_state_x_expanded_attention_mask"],
86
+ )
87
+
88
+ model_ir8.graph.node.append(sum_embeddings_op)
89
+
90
+ sum_mask_op = onnx.helper.make_node(
91
+ "ReduceSum",
92
+ inputs=["expanded_attention_mask", "sum_one_axis"],
93
+ outputs=["sum_expanded_attention_mask"],
94
+ )
95
+
96
+ model_ir8.graph.node.append(sum_mask_op)
97
+
98
+ clip_mask_op = onnx.helper.make_node(
99
+ "Clip",
100
+ inputs=["sum_expanded_attention_mask", "mask_clip_lower_limit"],
101
+ outputs=["clipped_sum_expanded_attention_mask"],
102
+ )
103
+
104
+ model_ir8.graph.node.append(clip_mask_op)
105
+
106
+ pooled_embeddings_op = onnx.helper.make_node(
107
+ "Div",
108
+ inputs=["sum_last_hidden_state_x_expanded_attention_mask", "clipped_sum_expanded_attention_mask"],
109
+ outputs=["pooled_embeddings"],
110
+ # outputs=["sentence_embeddings"]
111
+ )
112
+
113
+ model_ir8.graph.node.append(pooled_embeddings_op)
114
+
115
+ squeeze_pooled_embeddings_op = onnx.helper.make_node(
116
+ "Squeeze",
117
+ inputs=["pooled_embeddings", "sum_one_axis"],
118
+ outputs=["squeezed_pooled_embeddings"]
119
+
120
+ )
121
+
122
+ model_ir8.graph.node.append(squeeze_pooled_embeddings_op)
123
+
124
+ normalized_pooled_embeddings_op = onnx.helper.make_node(
125
+ "Normalizer",
126
+ domain="ai.onnx.ml",
127
+ inputs=["squeezed_pooled_embeddings"],
128
+ outputs=["sentence_embedding"],
129
+ norm = "L2"
130
+ )
131
+
132
+
133
+ model_ir8.graph.node.append(normalized_pooled_embeddings_op)
134
+
135
+ sentence_embeddings_output = onnx.helper.make_tensor_value_info(
136
+ "sentence_embedding",
137
+ onnx.TensorProto.FLOAT,
138
+ shape=["batch_size", output_embeddings_number]
139
+ )
140
+
141
+ model_ir8.graph.output.append(sentence_embeddings_output)
142
+
143
+ for node in model_ir8.graph.output:
144
+ if node.name == "last_hidden_state":
145
+ model_ir8.graph.output.remove(node)
146
+
147
+ model_ir8 = onnx.helper.make_model(model_ir8.graph, ir_version = 8, opset_imports = [op]) #to be sure that we have compatible opset and IR version
148
+
149
+ onnx.save(model_ir8, output_model, save_as_external_data = False)
150
+
151
+
152
+
153
+ with open('conversion_config.json') as json_file:
154
+ conversion_config = json.load(json_file)
155
+
156
+
157
+ model_id = conversion_config["model_id"]
158
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
159
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
160
+ opset = conversion_config["opset"]
161
+ IR = conversion_config["IR"]
162
+
163
+
164
+ op = onnx.OperatorSetIdProto()
165
+ op.version = opset
166
+
167
+
168
+ if not os.path.exists("onnx"):
169
+ os.makedirs("onnx")
170
+
171
+ print("Exporting the main model version")
172
+ try:
173
+ main_export(model_name_or_path=model_id, output="./", opset=opset, trust_remote_code=True, task="feature-extraction", dtype="fp32")
174
+ except:
175
+ huggingface_hub.hf_hub_download(repo_id=model_id, filename="model.onnx", local_dir="./")
176
+
177
+
178
+ if "fp32" in precision_to_filename_map:
179
+ print("Exporting the fp32 onnx file...")
180
+
181
+ shutil.copyfile('model.onnx', precision_to_filename_map["fp32"])
182
+ add_mean_pooling("model.onnx", precision_to_filename_map["fp32"], op, IR, number_of_generated_embeddings)
183
+
184
+ print("Done\n\n")
185
+
186
+ if "int8" in precision_to_filename_map:
187
+ print("Quantizing fp32 model to int8...")
188
+ quantize_dynamic("model.onnx", precision_to_filename_map["int8"], weight_type=QuantType.QInt8)
189
+ add_mean_pooling( precision_to_filename_map["int8"], precision_to_filename_map["int8"], op, IR, number_of_generated_embeddings)
190
+ print("Done\n\n")
191
+
192
+ if "uint8" in precision_to_filename_map:
193
+ print("Quantizing fp32 model to uint8...")
194
+ quantize_dynamic("model.onnx", precision_to_filename_map["uint8"], weight_type=QuantType.QUInt8)
195
+ add_mean_pooling( precision_to_filename_map["uint8"], precision_to_filename_map["uint8"], op, IR, number_of_generated_embeddings)
196
+ print("Done\n\n")
197
+
198
+ os.remove("model.onnx")
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:826db608d156761e86f892250600aae670b31c886b392c8badc35a0f80c88945
3
+ size 129810013
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8251c0f52cde601ee024eca8203cb0f754a2ac0c813b8dfdf300c9986836b049
3
+ size 32653121
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a5c7f6eb8377efa7428443ffad5a39a68ebb3ffc0c1b77e658ef72aaa068b70
3
+ size 32653134
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
test_local.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import onnxruntime as rt
2
+
3
+ from sentence_transformers.util import cos_sim
4
+ from sentence_transformers import SentenceTransformer
5
+
6
+ import transformers
7
+
8
+ import gc
9
+ import json
10
+
11
+
12
+ with open('conversion_config.json') as json_file:
13
+ conversion_config = json.load(json_file)
14
+
15
+
16
+ model_id = conversion_config["model_id"]
17
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
18
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
19
+
20
+ sentences_1 = 'How is the weather today?'
21
+ sentences_2 = 'What is the current weather like today?'
22
+
23
+ print(f"Testing on cosine similiarity between sentences: \n'{sentences_1}'\n'{sentences_2}'\n\n\n")
24
+
25
+ tokenizer = transformers.AutoTokenizer.from_pretrained("./", trust_remote_code=True)
26
+ enc1 = tokenizer(sentences_1)
27
+ enc2 = tokenizer(sentences_2)
28
+
29
+ for precision, file_name in precision_to_filename_map.items():
30
+
31
+
32
+ onnx_session = rt.InferenceSession(file_name)
33
+ embeddings_1_onnx = onnx_session.run(None, {"input_ids": [enc1.input_ids],
34
+ "attention_mask": [enc1.attention_mask], "token_type_ids": [enc1.token_type_ids] })[0][0]
35
+
36
+
37
+ embeddings_2_onnx = onnx_session.run(None, {"input_ids": [enc2.input_ids],
38
+ "attention_mask": [enc2.attention_mask], "token_type_ids": [enc2.token_type_ids]})[0][0]
39
+ del onnx_session
40
+ gc.collect()
41
+ print(f'Cosine similiarity for ONNX model with precision "{precision}" is {str(cos_sim(embeddings_1_onnx, embeddings_2_onnx))}')
42
+
43
+
44
+
45
+
46
+ model = SentenceTransformer(model_id, trust_remote_code=True)
47
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
48
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
49
+ print('Cosine similiarity for original sentence transformer model is '+str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
test_teradata.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import teradataml as tdml
3
+ from tabulate import tabulate
4
+
5
+ import json
6
+
7
+
8
+ with open('conversion_config.json') as json_file:
9
+ conversion_config = json.load(json_file)
10
+
11
+
12
+ model_id = conversion_config["model_id"]
13
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
14
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
15
+
16
+ host = sys.argv[1]
17
+ username = sys.argv[2]
18
+ password = sys.argv[3]
19
+
20
+ print("Setting up connection to teradata...")
21
+ tdml.create_context(host = host, username = username, password = password)
22
+ print("Done\n\n")
23
+
24
+
25
+ print("Deploying tokenizer...")
26
+ try:
27
+ tdml.db_drop_table('tokenizer_table')
28
+ except:
29
+ print("Can't drop tokenizers table - it's not existing")
30
+ tdml.save_byom('tokenizer',
31
+ 'tokenizer.json',
32
+ 'tokenizer_table')
33
+ print("Done\n\n")
34
+
35
+ print("Testing models...")
36
+ try:
37
+ tdml.db_drop_table('model_table')
38
+ except:
39
+ print("Can't drop models table - it's not existing")
40
+
41
+ for precision, file_name in precision_to_filename_map.items():
42
+ print(f"Deploying {precision} model...")
43
+ tdml.save_byom(precision,
44
+ file_name,
45
+ 'model_table')
46
+ print(f"Model {precision} is deployed\n")
47
+
48
+ print(f"Calculating embeddings with {precision} model...")
49
+ try:
50
+ tdml.db_drop_table('emails_embeddings_store')
51
+ except:
52
+ print("Can't drop embeddings table - it's not existing")
53
+
54
+ tdml.execute_sql(f"""
55
+ create volatile table emails_embeddings_store as (
56
+ select
57
+ *
58
+ from mldb.ONNXEmbeddings(
59
+ on emails.emails as InputTable
60
+ on (select * from model_table where model_id = '{precision}') as ModelTable DIMENSION
61
+ on (select model as tokenizer from tokenizer_table where model_id = 'tokenizer') as TokenizerTable DIMENSION
62
+
63
+ using
64
+ Accumulate('id', 'txt')
65
+ ModelOutputTensor('sentence_embedding')
66
+ EnableMemoryCheck('false')
67
+ OutputFormat('FLOAT32({number_of_generated_embeddings})')
68
+ OverwriteCachedModel('true')
69
+ ) a
70
+ ) with data on commit preserve rows
71
+
72
+ """)
73
+ print("Embeddings calculated")
74
+ print(f"Testing semantic search with cosine similiarity on the output of the model with precision '{precision}'...")
75
+ tdf_embeddings_store = tdml.DataFrame('emails_embeddings_store')
76
+ tdf_embeddings_store_tgt = tdf_embeddings_store[tdf_embeddings_store.id == 3]
77
+
78
+ tdf_embeddings_store_ref = tdf_embeddings_store[tdf_embeddings_store.id != 3]
79
+
80
+ cos_sim_pd = tdml.DataFrame.from_query(f"""
81
+ SELECT
82
+ dt.target_id,
83
+ dt.reference_id,
84
+ e_tgt.txt as target_txt,
85
+ e_ref.txt as reference_txt,
86
+ (1.0 - dt.distance) as similiarity
87
+ FROM
88
+ TD_VECTORDISTANCE (
89
+ ON ({tdf_embeddings_store_tgt.show_query()}) AS TargetTable
90
+ ON ({tdf_embeddings_store_ref.show_query()}) AS ReferenceTable DIMENSION
91
+ USING
92
+ TargetIDColumn('id')
93
+ TargetFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
94
+ RefIDColumn('id')
95
+ RefFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
96
+ DistanceMeasure('cosine')
97
+ topk(3)
98
+ ) AS dt
99
+ JOIN emails.emails e_tgt on e_tgt.id = dt.target_id
100
+ JOIN emails.emails e_ref on e_ref.id = dt.reference_id;
101
+ """).to_pandas()
102
+ print(tabulate(cos_sim_pd, headers='keys', tablefmt='fancy_grid'))
103
+ print("Done\n\n")
104
+
105
+
106
+ tdml.remove_context()
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 2147483648,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff