martinhillebrandtd commited on
Commit
b22888b
·
1 Parent(s): 45fcce7
README.md CHANGED
@@ -1,3 +1,2762 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: transformers
5
+ license: apache-2.0
6
+ model-index:
7
+ - name: gte-base-en-v1.5
8
+ results:
9
+ - dataset:
10
+ config: en
11
+ name: MTEB AmazonCounterfactualClassification (en)
12
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
13
+ split: test
14
+ type: mteb/amazon_counterfactual
15
+ metrics:
16
+ - type: accuracy
17
+ value: 74.7910447761194
18
+ - type: ap
19
+ value: 37.053785713650626
20
+ - type: f1
21
+ value: 68.51101510998551
22
+ task:
23
+ type: Classification
24
+ - dataset:
25
+ config: default
26
+ name: MTEB AmazonPolarityClassification
27
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
28
+ split: test
29
+ type: mteb/amazon_polarity
30
+ metrics:
31
+ - type: accuracy
32
+ value: 93.016875
33
+ - type: ap
34
+ value: 89.17750268426342
35
+ - type: f1
36
+ value: 92.9970977240524
37
+ task:
38
+ type: Classification
39
+ - dataset:
40
+ config: en
41
+ name: MTEB AmazonReviewsClassification (en)
42
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
43
+ split: test
44
+ type: mteb/amazon_reviews_multi
45
+ metrics:
46
+ - type: accuracy
47
+ value: 53.312000000000005
48
+ - type: f1
49
+ value: 52.98175784163017
50
+ task:
51
+ type: Classification
52
+ - dataset:
53
+ config: default
54
+ name: MTEB ArguAna
55
+ revision: c22ab2a51041ffd869aaddef7af8d8215647e41a
56
+ split: test
57
+ type: mteb/arguana
58
+ metrics:
59
+ - type: map_at_1
60
+ value: 38.193
61
+ - type: map_at_10
62
+ value: 54.848
63
+ - type: map_at_100
64
+ value: 55.388000000000005
65
+ - type: map_at_1000
66
+ value: 55.388999999999996
67
+ - type: map_at_3
68
+ value: 50.427
69
+ - type: map_at_5
70
+ value: 53.105000000000004
71
+ - type: mrr_at_1
72
+ value: 39.047
73
+ - type: mrr_at_10
74
+ value: 55.153
75
+ - type: mrr_at_100
76
+ value: 55.686
77
+ - type: mrr_at_1000
78
+ value: 55.688
79
+ - type: mrr_at_3
80
+ value: 50.676
81
+ - type: mrr_at_5
82
+ value: 53.417
83
+ - type: ndcg_at_1
84
+ value: 38.193
85
+ - type: ndcg_at_10
86
+ value: 63.486
87
+ - type: ndcg_at_100
88
+ value: 65.58
89
+ - type: ndcg_at_1000
90
+ value: 65.61
91
+ - type: ndcg_at_3
92
+ value: 54.494
93
+ - type: ndcg_at_5
94
+ value: 59.339
95
+ - type: precision_at_1
96
+ value: 38.193
97
+ - type: precision_at_10
98
+ value: 9.075
99
+ - type: precision_at_100
100
+ value: 0.9939999999999999
101
+ - type: precision_at_1000
102
+ value: 0.1
103
+ - type: precision_at_3
104
+ value: 22.096
105
+ - type: precision_at_5
106
+ value: 15.619
107
+ - type: recall_at_1
108
+ value: 38.193
109
+ - type: recall_at_10
110
+ value: 90.754
111
+ - type: recall_at_100
112
+ value: 99.431
113
+ - type: recall_at_1000
114
+ value: 99.644
115
+ - type: recall_at_3
116
+ value: 66.28699999999999
117
+ - type: recall_at_5
118
+ value: 78.094
119
+ task:
120
+ type: Retrieval
121
+ - dataset:
122
+ config: default
123
+ name: MTEB ArxivClusteringP2P
124
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
125
+ split: test
126
+ type: mteb/arxiv-clustering-p2p
127
+ metrics:
128
+ - type: v_measure
129
+ value: 47.508221208908964
130
+ task:
131
+ type: Clustering
132
+ - dataset:
133
+ config: default
134
+ name: MTEB ArxivClusteringS2S
135
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
136
+ split: test
137
+ type: mteb/arxiv-clustering-s2s
138
+ metrics:
139
+ - type: v_measure
140
+ value: 42.04668382560096
141
+ task:
142
+ type: Clustering
143
+ - dataset:
144
+ config: default
145
+ name: MTEB AskUbuntuDupQuestions
146
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
147
+ split: test
148
+ type: mteb/askubuntudupquestions-reranking
149
+ metrics:
150
+ - type: map
151
+ value: 61.828759903716815
152
+ - type: mrr
153
+ value: 74.37343358395991
154
+ task:
155
+ type: Reranking
156
+ - dataset:
157
+ config: default
158
+ name: MTEB BIOSSES
159
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
160
+ split: test
161
+ type: mteb/biosses-sts
162
+ metrics:
163
+ - type: cos_sim_pearson
164
+ value: 85.03673698773017
165
+ - type: cos_sim_spearman
166
+ value: 83.6470866785058
167
+ - type: euclidean_pearson
168
+ value: 82.64048673096565
169
+ - type: euclidean_spearman
170
+ value: 83.63142367101115
171
+ - type: manhattan_pearson
172
+ value: 82.71493099760228
173
+ - type: manhattan_spearman
174
+ value: 83.60491704294326
175
+ task:
176
+ type: STS
177
+ - dataset:
178
+ config: default
179
+ name: MTEB Banking77Classification
180
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
181
+ split: test
182
+ type: mteb/banking77
183
+ metrics:
184
+ - type: accuracy
185
+ value: 86.73376623376623
186
+ - type: f1
187
+ value: 86.70294049278262
188
+ task:
189
+ type: Classification
190
+ - dataset:
191
+ config: default
192
+ name: MTEB BiorxivClusteringP2P
193
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
194
+ split: test
195
+ type: mteb/biorxiv-clustering-p2p
196
+ metrics:
197
+ - type: v_measure
198
+ value: 40.31923804167062
199
+ task:
200
+ type: Clustering
201
+ - dataset:
202
+ config: default
203
+ name: MTEB BiorxivClusteringS2S
204
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
205
+ split: test
206
+ type: mteb/biorxiv-clustering-s2s
207
+ metrics:
208
+ - type: v_measure
209
+ value: 37.552547125348454
210
+ task:
211
+ type: Clustering
212
+ - dataset:
213
+ config: default
214
+ name: MTEB CQADupstackAndroidRetrieval
215
+ revision: f46a197baaae43b4f621051089b82a364682dfeb
216
+ split: test
217
+ type: mteb/cqadupstack-android
218
+ metrics:
219
+ - type: map_at_1
220
+ value: 30.567
221
+ - type: map_at_10
222
+ value: 41.269
223
+ - type: map_at_100
224
+ value: 42.689
225
+ - type: map_at_1000
226
+ value: 42.84
227
+ - type: map_at_3
228
+ value: 37.567
229
+ - type: map_at_5
230
+ value: 39.706
231
+ - type: mrr_at_1
232
+ value: 37.053000000000004
233
+ - type: mrr_at_10
234
+ value: 46.900999999999996
235
+ - type: mrr_at_100
236
+ value: 47.662
237
+ - type: mrr_at_1000
238
+ value: 47.713
239
+ - type: mrr_at_3
240
+ value: 43.801
241
+ - type: mrr_at_5
242
+ value: 45.689
243
+ - type: ndcg_at_1
244
+ value: 37.053000000000004
245
+ - type: ndcg_at_10
246
+ value: 47.73
247
+ - type: ndcg_at_100
248
+ value: 53.128
249
+ - type: ndcg_at_1000
250
+ value: 55.300000000000004
251
+ - type: ndcg_at_3
252
+ value: 42.046
253
+ - type: ndcg_at_5
254
+ value: 44.782
255
+ - type: precision_at_1
256
+ value: 37.053000000000004
257
+ - type: precision_at_10
258
+ value: 9.142
259
+ - type: precision_at_100
260
+ value: 1.485
261
+ - type: precision_at_1000
262
+ value: 0.197
263
+ - type: precision_at_3
264
+ value: 20.076
265
+ - type: precision_at_5
266
+ value: 14.535
267
+ - type: recall_at_1
268
+ value: 30.567
269
+ - type: recall_at_10
270
+ value: 60.602999999999994
271
+ - type: recall_at_100
272
+ value: 83.22800000000001
273
+ - type: recall_at_1000
274
+ value: 96.696
275
+ - type: recall_at_3
276
+ value: 44.336999999999996
277
+ - type: recall_at_5
278
+ value: 51.949
279
+ task:
280
+ type: Retrieval
281
+ - dataset:
282
+ config: default
283
+ name: MTEB CQADupstackEnglishRetrieval
284
+ revision: ad9991cb51e31e31e430383c75ffb2885547b5f0
285
+ split: test
286
+ type: mteb/cqadupstack-english
287
+ metrics:
288
+ - type: map_at_1
289
+ value: 28.538000000000004
290
+ - type: map_at_10
291
+ value: 38.757999999999996
292
+ - type: map_at_100
293
+ value: 40.129
294
+ - type: map_at_1000
295
+ value: 40.262
296
+ - type: map_at_3
297
+ value: 35.866
298
+ - type: map_at_5
299
+ value: 37.417
300
+ - type: mrr_at_1
301
+ value: 36.051
302
+ - type: mrr_at_10
303
+ value: 44.868
304
+ - type: mrr_at_100
305
+ value: 45.568999999999996
306
+ - type: mrr_at_1000
307
+ value: 45.615
308
+ - type: mrr_at_3
309
+ value: 42.558
310
+ - type: mrr_at_5
311
+ value: 43.883
312
+ - type: ndcg_at_1
313
+ value: 36.051
314
+ - type: ndcg_at_10
315
+ value: 44.584
316
+ - type: ndcg_at_100
317
+ value: 49.356
318
+ - type: ndcg_at_1000
319
+ value: 51.39
320
+ - type: ndcg_at_3
321
+ value: 40.389
322
+ - type: ndcg_at_5
323
+ value: 42.14
324
+ - type: precision_at_1
325
+ value: 36.051
326
+ - type: precision_at_10
327
+ value: 8.446
328
+ - type: precision_at_100
329
+ value: 1.411
330
+ - type: precision_at_1000
331
+ value: 0.19
332
+ - type: precision_at_3
333
+ value: 19.639
334
+ - type: precision_at_5
335
+ value: 13.796
336
+ - type: recall_at_1
337
+ value: 28.538000000000004
338
+ - type: recall_at_10
339
+ value: 54.99000000000001
340
+ - type: recall_at_100
341
+ value: 75.098
342
+ - type: recall_at_1000
343
+ value: 87.848
344
+ - type: recall_at_3
345
+ value: 42.236000000000004
346
+ - type: recall_at_5
347
+ value: 47.377
348
+ task:
349
+ type: Retrieval
350
+ - dataset:
351
+ config: default
352
+ name: MTEB CQADupstackGamingRetrieval
353
+ revision: 4885aa143210c98657558c04aaf3dc47cfb54340
354
+ split: test
355
+ type: mteb/cqadupstack-gaming
356
+ metrics:
357
+ - type: map_at_1
358
+ value: 37.188
359
+ - type: map_at_10
360
+ value: 50.861000000000004
361
+ - type: map_at_100
362
+ value: 51.917
363
+ - type: map_at_1000
364
+ value: 51.964999999999996
365
+ - type: map_at_3
366
+ value: 47.144000000000005
367
+ - type: map_at_5
368
+ value: 49.417
369
+ - type: mrr_at_1
370
+ value: 42.571
371
+ - type: mrr_at_10
372
+ value: 54.086999999999996
373
+ - type: mrr_at_100
374
+ value: 54.739000000000004
375
+ - type: mrr_at_1000
376
+ value: 54.762
377
+ - type: mrr_at_3
378
+ value: 51.285000000000004
379
+ - type: mrr_at_5
380
+ value: 53.0
381
+ - type: ndcg_at_1
382
+ value: 42.571
383
+ - type: ndcg_at_10
384
+ value: 57.282
385
+ - type: ndcg_at_100
386
+ value: 61.477000000000004
387
+ - type: ndcg_at_1000
388
+ value: 62.426
389
+ - type: ndcg_at_3
390
+ value: 51.0
391
+ - type: ndcg_at_5
392
+ value: 54.346000000000004
393
+ - type: precision_at_1
394
+ value: 42.571
395
+ - type: precision_at_10
396
+ value: 9.467
397
+ - type: precision_at_100
398
+ value: 1.2550000000000001
399
+ - type: precision_at_1000
400
+ value: 0.13799999999999998
401
+ - type: precision_at_3
402
+ value: 23.114
403
+ - type: precision_at_5
404
+ value: 16.250999999999998
405
+ - type: recall_at_1
406
+ value: 37.188
407
+ - type: recall_at_10
408
+ value: 73.068
409
+ - type: recall_at_100
410
+ value: 91.203
411
+ - type: recall_at_1000
412
+ value: 97.916
413
+ - type: recall_at_3
414
+ value: 56.552
415
+ - type: recall_at_5
416
+ value: 64.567
417
+ task:
418
+ type: Retrieval
419
+ - dataset:
420
+ config: default
421
+ name: MTEB CQADupstackGisRetrieval
422
+ revision: 5003b3064772da1887988e05400cf3806fe491f2
423
+ split: test
424
+ type: mteb/cqadupstack-gis
425
+ metrics:
426
+ - type: map_at_1
427
+ value: 25.041000000000004
428
+ - type: map_at_10
429
+ value: 33.86
430
+ - type: map_at_100
431
+ value: 34.988
432
+ - type: map_at_1000
433
+ value: 35.064
434
+ - type: map_at_3
435
+ value: 31.049
436
+ - type: map_at_5
437
+ value: 32.845
438
+ - type: mrr_at_1
439
+ value: 26.893
440
+ - type: mrr_at_10
441
+ value: 35.594
442
+ - type: mrr_at_100
443
+ value: 36.617
444
+ - type: mrr_at_1000
445
+ value: 36.671
446
+ - type: mrr_at_3
447
+ value: 33.051
448
+ - type: mrr_at_5
449
+ value: 34.61
450
+ - type: ndcg_at_1
451
+ value: 26.893
452
+ - type: ndcg_at_10
453
+ value: 38.674
454
+ - type: ndcg_at_100
455
+ value: 44.178
456
+ - type: ndcg_at_1000
457
+ value: 46.089999999999996
458
+ - type: ndcg_at_3
459
+ value: 33.485
460
+ - type: ndcg_at_5
461
+ value: 36.402
462
+ - type: precision_at_1
463
+ value: 26.893
464
+ - type: precision_at_10
465
+ value: 5.989
466
+ - type: precision_at_100
467
+ value: 0.918
468
+ - type: precision_at_1000
469
+ value: 0.11100000000000002
470
+ - type: precision_at_3
471
+ value: 14.2
472
+ - type: precision_at_5
473
+ value: 10.26
474
+ - type: recall_at_1
475
+ value: 25.041000000000004
476
+ - type: recall_at_10
477
+ value: 51.666000000000004
478
+ - type: recall_at_100
479
+ value: 76.896
480
+ - type: recall_at_1000
481
+ value: 91.243
482
+ - type: recall_at_3
483
+ value: 38.035999999999994
484
+ - type: recall_at_5
485
+ value: 44.999
486
+ task:
487
+ type: Retrieval
488
+ - dataset:
489
+ config: default
490
+ name: MTEB CQADupstackMathematicaRetrieval
491
+ revision: 90fceea13679c63fe563ded68f3b6f06e50061de
492
+ split: test
493
+ type: mteb/cqadupstack-mathematica
494
+ metrics:
495
+ - type: map_at_1
496
+ value: 15.909999999999998
497
+ - type: map_at_10
498
+ value: 23.901
499
+ - type: map_at_100
500
+ value: 25.165
501
+ - type: map_at_1000
502
+ value: 25.291000000000004
503
+ - type: map_at_3
504
+ value: 21.356
505
+ - type: map_at_5
506
+ value: 22.816
507
+ - type: mrr_at_1
508
+ value: 20.025000000000002
509
+ - type: mrr_at_10
510
+ value: 28.382
511
+ - type: mrr_at_100
512
+ value: 29.465000000000003
513
+ - type: mrr_at_1000
514
+ value: 29.535
515
+ - type: mrr_at_3
516
+ value: 25.933
517
+ - type: mrr_at_5
518
+ value: 27.332
519
+ - type: ndcg_at_1
520
+ value: 20.025000000000002
521
+ - type: ndcg_at_10
522
+ value: 29.099000000000004
523
+ - type: ndcg_at_100
524
+ value: 35.127
525
+ - type: ndcg_at_1000
526
+ value: 38.096000000000004
527
+ - type: ndcg_at_3
528
+ value: 24.464
529
+ - type: ndcg_at_5
530
+ value: 26.709
531
+ - type: precision_at_1
532
+ value: 20.025000000000002
533
+ - type: precision_at_10
534
+ value: 5.398
535
+ - type: precision_at_100
536
+ value: 0.9690000000000001
537
+ - type: precision_at_1000
538
+ value: 0.13699999999999998
539
+ - type: precision_at_3
540
+ value: 11.774
541
+ - type: precision_at_5
542
+ value: 8.632
543
+ - type: recall_at_1
544
+ value: 15.909999999999998
545
+ - type: recall_at_10
546
+ value: 40.672000000000004
547
+ - type: recall_at_100
548
+ value: 66.855
549
+ - type: recall_at_1000
550
+ value: 87.922
551
+ - type: recall_at_3
552
+ value: 28.069
553
+ - type: recall_at_5
554
+ value: 33.812
555
+ task:
556
+ type: Retrieval
557
+ - dataset:
558
+ config: default
559
+ name: MTEB CQADupstackPhysicsRetrieval
560
+ revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4
561
+ split: test
562
+ type: mteb/cqadupstack-physics
563
+ metrics:
564
+ - type: map_at_1
565
+ value: 30.175
566
+ - type: map_at_10
567
+ value: 41.36
568
+ - type: map_at_100
569
+ value: 42.701
570
+ - type: map_at_1000
571
+ value: 42.817
572
+ - type: map_at_3
573
+ value: 37.931
574
+ - type: map_at_5
575
+ value: 39.943
576
+ - type: mrr_at_1
577
+ value: 35.611
578
+ - type: mrr_at_10
579
+ value: 46.346
580
+ - type: mrr_at_100
581
+ value: 47.160000000000004
582
+ - type: mrr_at_1000
583
+ value: 47.203
584
+ - type: mrr_at_3
585
+ value: 43.712
586
+ - type: mrr_at_5
587
+ value: 45.367000000000004
588
+ - type: ndcg_at_1
589
+ value: 35.611
590
+ - type: ndcg_at_10
591
+ value: 47.532000000000004
592
+ - type: ndcg_at_100
593
+ value: 53.003
594
+ - type: ndcg_at_1000
595
+ value: 55.007
596
+ - type: ndcg_at_3
597
+ value: 42.043
598
+ - type: ndcg_at_5
599
+ value: 44.86
600
+ - type: precision_at_1
601
+ value: 35.611
602
+ - type: precision_at_10
603
+ value: 8.624
604
+ - type: precision_at_100
605
+ value: 1.332
606
+ - type: precision_at_1000
607
+ value: 0.169
608
+ - type: precision_at_3
609
+ value: 20.083000000000002
610
+ - type: precision_at_5
611
+ value: 14.437
612
+ - type: recall_at_1
613
+ value: 30.175
614
+ - type: recall_at_10
615
+ value: 60.5
616
+ - type: recall_at_100
617
+ value: 83.399
618
+ - type: recall_at_1000
619
+ value: 96.255
620
+ - type: recall_at_3
621
+ value: 45.448
622
+ - type: recall_at_5
623
+ value: 52.432
624
+ task:
625
+ type: Retrieval
626
+ - dataset:
627
+ config: default
628
+ name: MTEB CQADupstackProgrammersRetrieval
629
+ revision: 6184bc1440d2dbc7612be22b50686b8826d22b32
630
+ split: test
631
+ type: mteb/cqadupstack-programmers
632
+ metrics:
633
+ - type: map_at_1
634
+ value: 22.467000000000002
635
+ - type: map_at_10
636
+ value: 33.812999999999995
637
+ - type: map_at_100
638
+ value: 35.248000000000005
639
+ - type: map_at_1000
640
+ value: 35.359
641
+ - type: map_at_3
642
+ value: 30.316
643
+ - type: map_at_5
644
+ value: 32.233000000000004
645
+ - type: mrr_at_1
646
+ value: 28.310999999999996
647
+ - type: mrr_at_10
648
+ value: 38.979
649
+ - type: mrr_at_100
650
+ value: 39.937
651
+ - type: mrr_at_1000
652
+ value: 39.989999999999995
653
+ - type: mrr_at_3
654
+ value: 36.244
655
+ - type: mrr_at_5
656
+ value: 37.871
657
+ - type: ndcg_at_1
658
+ value: 28.310999999999996
659
+ - type: ndcg_at_10
660
+ value: 40.282000000000004
661
+ - type: ndcg_at_100
662
+ value: 46.22
663
+ - type: ndcg_at_1000
664
+ value: 48.507
665
+ - type: ndcg_at_3
666
+ value: 34.596
667
+ - type: ndcg_at_5
668
+ value: 37.267
669
+ - type: precision_at_1
670
+ value: 28.310999999999996
671
+ - type: precision_at_10
672
+ value: 7.831
673
+ - type: precision_at_100
674
+ value: 1.257
675
+ - type: precision_at_1000
676
+ value: 0.164
677
+ - type: precision_at_3
678
+ value: 17.275
679
+ - type: precision_at_5
680
+ value: 12.556999999999999
681
+ - type: recall_at_1
682
+ value: 22.467000000000002
683
+ - type: recall_at_10
684
+ value: 54.14099999999999
685
+ - type: recall_at_100
686
+ value: 79.593
687
+ - type: recall_at_1000
688
+ value: 95.063
689
+ - type: recall_at_3
690
+ value: 38.539
691
+ - type: recall_at_5
692
+ value: 45.403
693
+ task:
694
+ type: Retrieval
695
+ - dataset:
696
+ config: default
697
+ name: MTEB CQADupstackRetrieval
698
+ revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4
699
+ split: test
700
+ type: mteb/cqadupstack
701
+ metrics:
702
+ - type: map_at_1
703
+ value: 24.18591666666667
704
+ - type: map_at_10
705
+ value: 33.84258333333333
706
+ - type: map_at_100
707
+ value: 35.11391666666666
708
+ - type: map_at_1000
709
+ value: 35.23258333333333
710
+ - type: map_at_3
711
+ value: 30.764249999999997
712
+ - type: map_at_5
713
+ value: 32.52333333333334
714
+ - type: mrr_at_1
715
+ value: 28.54733333333333
716
+ - type: mrr_at_10
717
+ value: 37.81725
718
+ - type: mrr_at_100
719
+ value: 38.716499999999996
720
+ - type: mrr_at_1000
721
+ value: 38.77458333333333
722
+ - type: mrr_at_3
723
+ value: 35.157833333333336
724
+ - type: mrr_at_5
725
+ value: 36.69816666666667
726
+ - type: ndcg_at_1
727
+ value: 28.54733333333333
728
+ - type: ndcg_at_10
729
+ value: 39.51508333333334
730
+ - type: ndcg_at_100
731
+ value: 44.95316666666666
732
+ - type: ndcg_at_1000
733
+ value: 47.257083333333334
734
+ - type: ndcg_at_3
735
+ value: 34.205833333333324
736
+ - type: ndcg_at_5
737
+ value: 36.78266666666667
738
+ - type: precision_at_1
739
+ value: 28.54733333333333
740
+ - type: precision_at_10
741
+ value: 7.082583333333334
742
+ - type: precision_at_100
743
+ value: 1.1590833333333332
744
+ - type: precision_at_1000
745
+ value: 0.15516666666666662
746
+ - type: precision_at_3
747
+ value: 15.908750000000001
748
+ - type: precision_at_5
749
+ value: 11.505416666666669
750
+ - type: recall_at_1
751
+ value: 24.18591666666667
752
+ - type: recall_at_10
753
+ value: 52.38758333333333
754
+ - type: recall_at_100
755
+ value: 76.13666666666667
756
+ - type: recall_at_1000
757
+ value: 91.99066666666667
758
+ - type: recall_at_3
759
+ value: 37.78333333333334
760
+ - type: recall_at_5
761
+ value: 44.30141666666666
762
+ task:
763
+ type: Retrieval
764
+ - dataset:
765
+ config: default
766
+ name: MTEB CQADupstackStatsRetrieval
767
+ revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a
768
+ split: test
769
+ type: mteb/cqadupstack-stats
770
+ metrics:
771
+ - type: map_at_1
772
+ value: 21.975
773
+ - type: map_at_10
774
+ value: 29.781000000000002
775
+ - type: map_at_100
776
+ value: 30.847
777
+ - type: map_at_1000
778
+ value: 30.94
779
+ - type: map_at_3
780
+ value: 27.167
781
+ - type: map_at_5
782
+ value: 28.633999999999997
783
+ - type: mrr_at_1
784
+ value: 24.387
785
+ - type: mrr_at_10
786
+ value: 32.476
787
+ - type: mrr_at_100
788
+ value: 33.337
789
+ - type: mrr_at_1000
790
+ value: 33.403
791
+ - type: mrr_at_3
792
+ value: 29.881999999999998
793
+ - type: mrr_at_5
794
+ value: 31.339
795
+ - type: ndcg_at_1
796
+ value: 24.387
797
+ - type: ndcg_at_10
798
+ value: 34.596
799
+ - type: ndcg_at_100
800
+ value: 39.635
801
+ - type: ndcg_at_1000
802
+ value: 42.079
803
+ - type: ndcg_at_3
804
+ value: 29.516
805
+ - type: ndcg_at_5
806
+ value: 31.959
807
+ - type: precision_at_1
808
+ value: 24.387
809
+ - type: precision_at_10
810
+ value: 5.6129999999999995
811
+ - type: precision_at_100
812
+ value: 0.8909999999999999
813
+ - type: precision_at_1000
814
+ value: 0.117
815
+ - type: precision_at_3
816
+ value: 12.73
817
+ - type: precision_at_5
818
+ value: 9.171999999999999
819
+ - type: recall_at_1
820
+ value: 21.975
821
+ - type: recall_at_10
822
+ value: 46.826
823
+ - type: recall_at_100
824
+ value: 69.554
825
+ - type: recall_at_1000
826
+ value: 87.749
827
+ - type: recall_at_3
828
+ value: 33.016
829
+ - type: recall_at_5
830
+ value: 38.97
831
+ task:
832
+ type: Retrieval
833
+ - dataset:
834
+ config: default
835
+ name: MTEB CQADupstackTexRetrieval
836
+ revision: 46989137a86843e03a6195de44b09deda022eec7
837
+ split: test
838
+ type: mteb/cqadupstack-tex
839
+ metrics:
840
+ - type: map_at_1
841
+ value: 15.614
842
+ - type: map_at_10
843
+ value: 22.927
844
+ - type: map_at_100
845
+ value: 24.185000000000002
846
+ - type: map_at_1000
847
+ value: 24.319
848
+ - type: map_at_3
849
+ value: 20.596
850
+ - type: map_at_5
851
+ value: 21.854000000000003
852
+ - type: mrr_at_1
853
+ value: 18.858
854
+ - type: mrr_at_10
855
+ value: 26.535999999999998
856
+ - type: mrr_at_100
857
+ value: 27.582
858
+ - type: mrr_at_1000
859
+ value: 27.665
860
+ - type: mrr_at_3
861
+ value: 24.295
862
+ - type: mrr_at_5
863
+ value: 25.532
864
+ - type: ndcg_at_1
865
+ value: 18.858
866
+ - type: ndcg_at_10
867
+ value: 27.583000000000002
868
+ - type: ndcg_at_100
869
+ value: 33.635
870
+ - type: ndcg_at_1000
871
+ value: 36.647
872
+ - type: ndcg_at_3
873
+ value: 23.348
874
+ - type: ndcg_at_5
875
+ value: 25.257
876
+ - type: precision_at_1
877
+ value: 18.858
878
+ - type: precision_at_10
879
+ value: 5.158
880
+ - type: precision_at_100
881
+ value: 0.964
882
+ - type: precision_at_1000
883
+ value: 0.13999999999999999
884
+ - type: precision_at_3
885
+ value: 11.092
886
+ - type: precision_at_5
887
+ value: 8.1
888
+ - type: recall_at_1
889
+ value: 15.614
890
+ - type: recall_at_10
891
+ value: 37.916
892
+ - type: recall_at_100
893
+ value: 65.205
894
+ - type: recall_at_1000
895
+ value: 86.453
896
+ - type: recall_at_3
897
+ value: 26.137
898
+ - type: recall_at_5
899
+ value: 31.087999999999997
900
+ task:
901
+ type: Retrieval
902
+ - dataset:
903
+ config: default
904
+ name: MTEB CQADupstackUnixRetrieval
905
+ revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53
906
+ split: test
907
+ type: mteb/cqadupstack-unix
908
+ metrics:
909
+ - type: map_at_1
910
+ value: 23.078000000000003
911
+ - type: map_at_10
912
+ value: 31.941999999999997
913
+ - type: map_at_100
914
+ value: 33.196999999999996
915
+ - type: map_at_1000
916
+ value: 33.303
917
+ - type: map_at_3
918
+ value: 28.927000000000003
919
+ - type: map_at_5
920
+ value: 30.707
921
+ - type: mrr_at_1
922
+ value: 26.866
923
+ - type: mrr_at_10
924
+ value: 35.557
925
+ - type: mrr_at_100
926
+ value: 36.569
927
+ - type: mrr_at_1000
928
+ value: 36.632
929
+ - type: mrr_at_3
930
+ value: 32.897999999999996
931
+ - type: mrr_at_5
932
+ value: 34.437
933
+ - type: ndcg_at_1
934
+ value: 26.866
935
+ - type: ndcg_at_10
936
+ value: 37.372
937
+ - type: ndcg_at_100
938
+ value: 43.248
939
+ - type: ndcg_at_1000
940
+ value: 45.632
941
+ - type: ndcg_at_3
942
+ value: 31.852999999999998
943
+ - type: ndcg_at_5
944
+ value: 34.582
945
+ - type: precision_at_1
946
+ value: 26.866
947
+ - type: precision_at_10
948
+ value: 6.511
949
+ - type: precision_at_100
950
+ value: 1.078
951
+ - type: precision_at_1000
952
+ value: 0.13899999999999998
953
+ - type: precision_at_3
954
+ value: 14.582999999999998
955
+ - type: precision_at_5
956
+ value: 10.634
957
+ - type: recall_at_1
958
+ value: 23.078000000000003
959
+ - type: recall_at_10
960
+ value: 50.334
961
+ - type: recall_at_100
962
+ value: 75.787
963
+ - type: recall_at_1000
964
+ value: 92.485
965
+ - type: recall_at_3
966
+ value: 35.386
967
+ - type: recall_at_5
968
+ value: 42.225
969
+ task:
970
+ type: Retrieval
971
+ - dataset:
972
+ config: default
973
+ name: MTEB CQADupstackWebmastersRetrieval
974
+ revision: 160c094312a0e1facb97e55eeddb698c0abe3571
975
+ split: test
976
+ type: mteb/cqadupstack-webmasters
977
+ metrics:
978
+ - type: map_at_1
979
+ value: 22.203999999999997
980
+ - type: map_at_10
981
+ value: 31.276
982
+ - type: map_at_100
983
+ value: 32.844
984
+ - type: map_at_1000
985
+ value: 33.062999999999995
986
+ - type: map_at_3
987
+ value: 27.733999999999998
988
+ - type: map_at_5
989
+ value: 29.64
990
+ - type: mrr_at_1
991
+ value: 27.272999999999996
992
+ - type: mrr_at_10
993
+ value: 36.083
994
+ - type: mrr_at_100
995
+ value: 37.008
996
+ - type: mrr_at_1000
997
+ value: 37.076
998
+ - type: mrr_at_3
999
+ value: 33.004
1000
+ - type: mrr_at_5
1001
+ value: 34.664
1002
+ - type: ndcg_at_1
1003
+ value: 27.272999999999996
1004
+ - type: ndcg_at_10
1005
+ value: 37.763000000000005
1006
+ - type: ndcg_at_100
1007
+ value: 43.566
1008
+ - type: ndcg_at_1000
1009
+ value: 46.356
1010
+ - type: ndcg_at_3
1011
+ value: 31.673000000000002
1012
+ - type: ndcg_at_5
1013
+ value: 34.501
1014
+ - type: precision_at_1
1015
+ value: 27.272999999999996
1016
+ - type: precision_at_10
1017
+ value: 7.470000000000001
1018
+ - type: precision_at_100
1019
+ value: 1.502
1020
+ - type: precision_at_1000
1021
+ value: 0.24
1022
+ - type: precision_at_3
1023
+ value: 14.756
1024
+ - type: precision_at_5
1025
+ value: 11.225
1026
+ - type: recall_at_1
1027
+ value: 22.203999999999997
1028
+ - type: recall_at_10
1029
+ value: 51.437999999999995
1030
+ - type: recall_at_100
1031
+ value: 76.845
1032
+ - type: recall_at_1000
1033
+ value: 94.38600000000001
1034
+ - type: recall_at_3
1035
+ value: 34.258
1036
+ - type: recall_at_5
1037
+ value: 41.512
1038
+ task:
1039
+ type: Retrieval
1040
+ - dataset:
1041
+ config: default
1042
+ name: MTEB CQADupstackWordpressRetrieval
1043
+ revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4
1044
+ split: test
1045
+ type: mteb/cqadupstack-wordpress
1046
+ metrics:
1047
+ - type: map_at_1
1048
+ value: 17.474
1049
+ - type: map_at_10
1050
+ value: 26.362999999999996
1051
+ - type: map_at_100
1052
+ value: 27.456999999999997
1053
+ - type: map_at_1000
1054
+ value: 27.567999999999998
1055
+ - type: map_at_3
1056
+ value: 23.518
1057
+ - type: map_at_5
1058
+ value: 25.068
1059
+ - type: mrr_at_1
1060
+ value: 18.669
1061
+ - type: mrr_at_10
1062
+ value: 27.998
1063
+ - type: mrr_at_100
1064
+ value: 28.953
1065
+ - type: mrr_at_1000
1066
+ value: 29.03
1067
+ - type: mrr_at_3
1068
+ value: 25.230999999999998
1069
+ - type: mrr_at_5
1070
+ value: 26.654
1071
+ - type: ndcg_at_1
1072
+ value: 18.669
1073
+ - type: ndcg_at_10
1074
+ value: 31.684
1075
+ - type: ndcg_at_100
1076
+ value: 36.864999999999995
1077
+ - type: ndcg_at_1000
1078
+ value: 39.555
1079
+ - type: ndcg_at_3
1080
+ value: 26.057000000000002
1081
+ - type: ndcg_at_5
1082
+ value: 28.587
1083
+ - type: precision_at_1
1084
+ value: 18.669
1085
+ - type: precision_at_10
1086
+ value: 5.3420000000000005
1087
+ - type: precision_at_100
1088
+ value: 0.847
1089
+ - type: precision_at_1000
1090
+ value: 0.12
1091
+ - type: precision_at_3
1092
+ value: 11.583
1093
+ - type: precision_at_5
1094
+ value: 8.466
1095
+ - type: recall_at_1
1096
+ value: 17.474
1097
+ - type: recall_at_10
1098
+ value: 46.497
1099
+ - type: recall_at_100
1100
+ value: 69.977
1101
+ - type: recall_at_1000
1102
+ value: 89.872
1103
+ - type: recall_at_3
1104
+ value: 31.385999999999996
1105
+ - type: recall_at_5
1106
+ value: 37.283
1107
+ task:
1108
+ type: Retrieval
1109
+ - dataset:
1110
+ config: default
1111
+ name: MTEB ClimateFEVER
1112
+ revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380
1113
+ split: test
1114
+ type: mteb/climate-fever
1115
+ metrics:
1116
+ - type: map_at_1
1117
+ value: 17.173
1118
+ - type: map_at_10
1119
+ value: 30.407
1120
+ - type: map_at_100
1121
+ value: 32.528
1122
+ - type: map_at_1000
1123
+ value: 32.698
1124
+ - type: map_at_3
1125
+ value: 25.523
1126
+ - type: map_at_5
1127
+ value: 28.038
1128
+ - type: mrr_at_1
1129
+ value: 38.958
1130
+ - type: mrr_at_10
1131
+ value: 51.515
1132
+ - type: mrr_at_100
1133
+ value: 52.214000000000006
1134
+ - type: mrr_at_1000
1135
+ value: 52.237
1136
+ - type: mrr_at_3
1137
+ value: 48.502
1138
+ - type: mrr_at_5
1139
+ value: 50.251000000000005
1140
+ - type: ndcg_at_1
1141
+ value: 38.958
1142
+ - type: ndcg_at_10
1143
+ value: 40.355000000000004
1144
+ - type: ndcg_at_100
1145
+ value: 47.68
1146
+ - type: ndcg_at_1000
1147
+ value: 50.370000000000005
1148
+ - type: ndcg_at_3
1149
+ value: 33.946
1150
+ - type: ndcg_at_5
1151
+ value: 36.057
1152
+ - type: precision_at_1
1153
+ value: 38.958
1154
+ - type: precision_at_10
1155
+ value: 12.508
1156
+ - type: precision_at_100
1157
+ value: 2.054
1158
+ - type: precision_at_1000
1159
+ value: 0.256
1160
+ - type: precision_at_3
1161
+ value: 25.581
1162
+ - type: precision_at_5
1163
+ value: 19.256999999999998
1164
+ - type: recall_at_1
1165
+ value: 17.173
1166
+ - type: recall_at_10
1167
+ value: 46.967
1168
+ - type: recall_at_100
1169
+ value: 71.47200000000001
1170
+ - type: recall_at_1000
1171
+ value: 86.238
1172
+ - type: recall_at_3
1173
+ value: 30.961
1174
+ - type: recall_at_5
1175
+ value: 37.539
1176
+ task:
1177
+ type: Retrieval
1178
+ - dataset:
1179
+ config: default
1180
+ name: MTEB DBPedia
1181
+ revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659
1182
+ split: test
1183
+ type: mteb/dbpedia
1184
+ metrics:
1185
+ - type: map_at_1
1186
+ value: 8.999
1187
+ - type: map_at_10
1188
+ value: 18.989
1189
+ - type: map_at_100
1190
+ value: 26.133
1191
+ - type: map_at_1000
1192
+ value: 27.666
1193
+ - type: map_at_3
1194
+ value: 13.918
1195
+ - type: map_at_5
1196
+ value: 16.473
1197
+ - type: mrr_at_1
1198
+ value: 66.25
1199
+ - type: mrr_at_10
1200
+ value: 74.161
1201
+ - type: mrr_at_100
1202
+ value: 74.516
1203
+ - type: mrr_at_1000
1204
+ value: 74.524
1205
+ - type: mrr_at_3
1206
+ value: 72.875
1207
+ - type: mrr_at_5
1208
+ value: 73.613
1209
+ - type: ndcg_at_1
1210
+ value: 54.37499999999999
1211
+ - type: ndcg_at_10
1212
+ value: 39.902
1213
+ - type: ndcg_at_100
1214
+ value: 44.212
1215
+ - type: ndcg_at_1000
1216
+ value: 51.62
1217
+ - type: ndcg_at_3
1218
+ value: 45.193
1219
+ - type: ndcg_at_5
1220
+ value: 42.541000000000004
1221
+ - type: precision_at_1
1222
+ value: 66.25
1223
+ - type: precision_at_10
1224
+ value: 30.425
1225
+ - type: precision_at_100
1226
+ value: 9.754999999999999
1227
+ - type: precision_at_1000
1228
+ value: 2.043
1229
+ - type: precision_at_3
1230
+ value: 48.25
1231
+ - type: precision_at_5
1232
+ value: 40.65
1233
+ - type: recall_at_1
1234
+ value: 8.999
1235
+ - type: recall_at_10
1236
+ value: 24.133
1237
+ - type: recall_at_100
1238
+ value: 49.138999999999996
1239
+ - type: recall_at_1000
1240
+ value: 72.639
1241
+ - type: recall_at_3
1242
+ value: 15.287999999999998
1243
+ - type: recall_at_5
1244
+ value: 19.415
1245
+ task:
1246
+ type: Retrieval
1247
+ - dataset:
1248
+ config: default
1249
+ name: MTEB EmotionClassification
1250
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1251
+ split: test
1252
+ type: mteb/emotion
1253
+ metrics:
1254
+ - type: accuracy
1255
+ value: 46.38999999999999
1256
+ - type: f1
1257
+ value: 41.444205512055234
1258
+ task:
1259
+ type: Classification
1260
+ - dataset:
1261
+ config: default
1262
+ name: MTEB FEVER
1263
+ revision: bea83ef9e8fb933d90a2f1d5515737465d613e12
1264
+ split: test
1265
+ type: mteb/fever
1266
+ metrics:
1267
+ - type: map_at_1
1268
+ value: 87.35000000000001
1269
+ - type: map_at_10
1270
+ value: 92.837
1271
+ - type: map_at_100
1272
+ value: 92.996
1273
+ - type: map_at_1000
1274
+ value: 93.006
1275
+ - type: map_at_3
1276
+ value: 92.187
1277
+ - type: map_at_5
1278
+ value: 92.595
1279
+ - type: mrr_at_1
1280
+ value: 93.864
1281
+ - type: mrr_at_10
1282
+ value: 96.723
1283
+ - type: mrr_at_100
1284
+ value: 96.72500000000001
1285
+ - type: mrr_at_1000
1286
+ value: 96.72500000000001
1287
+ - type: mrr_at_3
1288
+ value: 96.64
1289
+ - type: mrr_at_5
1290
+ value: 96.71499999999999
1291
+ - type: ndcg_at_1
1292
+ value: 93.864
1293
+ - type: ndcg_at_10
1294
+ value: 94.813
1295
+ - type: ndcg_at_100
1296
+ value: 95.243
1297
+ - type: ndcg_at_1000
1298
+ value: 95.38600000000001
1299
+ - type: ndcg_at_3
1300
+ value: 94.196
1301
+ - type: ndcg_at_5
1302
+ value: 94.521
1303
+ - type: precision_at_1
1304
+ value: 93.864
1305
+ - type: precision_at_10
1306
+ value: 10.951
1307
+ - type: precision_at_100
1308
+ value: 1.1400000000000001
1309
+ - type: precision_at_1000
1310
+ value: 0.117
1311
+ - type: precision_at_3
1312
+ value: 35.114000000000004
1313
+ - type: precision_at_5
1314
+ value: 21.476
1315
+ - type: recall_at_1
1316
+ value: 87.35000000000001
1317
+ - type: recall_at_10
1318
+ value: 96.941
1319
+ - type: recall_at_100
1320
+ value: 98.397
1321
+ - type: recall_at_1000
1322
+ value: 99.21600000000001
1323
+ - type: recall_at_3
1324
+ value: 95.149
1325
+ - type: recall_at_5
1326
+ value: 96.131
1327
+ task:
1328
+ type: Retrieval
1329
+ - dataset:
1330
+ config: default
1331
+ name: MTEB FiQA2018
1332
+ revision: 27a168819829fe9bcd655c2df245fb19452e8e06
1333
+ split: test
1334
+ type: mteb/fiqa
1335
+ metrics:
1336
+ - type: map_at_1
1337
+ value: 24.476
1338
+ - type: map_at_10
1339
+ value: 40.11
1340
+ - type: map_at_100
1341
+ value: 42.229
1342
+ - type: map_at_1000
1343
+ value: 42.378
1344
+ - type: map_at_3
1345
+ value: 34.512
1346
+ - type: map_at_5
1347
+ value: 38.037
1348
+ - type: mrr_at_1
1349
+ value: 47.839999999999996
1350
+ - type: mrr_at_10
1351
+ value: 57.053
1352
+ - type: mrr_at_100
1353
+ value: 57.772
1354
+ - type: mrr_at_1000
1355
+ value: 57.799
1356
+ - type: mrr_at_3
1357
+ value: 54.552
1358
+ - type: mrr_at_5
1359
+ value: 56.011
1360
+ - type: ndcg_at_1
1361
+ value: 47.839999999999996
1362
+ - type: ndcg_at_10
1363
+ value: 48.650999999999996
1364
+ - type: ndcg_at_100
1365
+ value: 55.681000000000004
1366
+ - type: ndcg_at_1000
1367
+ value: 57.979
1368
+ - type: ndcg_at_3
1369
+ value: 43.923
1370
+ - type: ndcg_at_5
1371
+ value: 46.037
1372
+ - type: precision_at_1
1373
+ value: 47.839999999999996
1374
+ - type: precision_at_10
1375
+ value: 13.395000000000001
1376
+ - type: precision_at_100
1377
+ value: 2.0660000000000003
1378
+ - type: precision_at_1000
1379
+ value: 0.248
1380
+ - type: precision_at_3
1381
+ value: 29.064
1382
+ - type: precision_at_5
1383
+ value: 22.006
1384
+ - type: recall_at_1
1385
+ value: 24.476
1386
+ - type: recall_at_10
1387
+ value: 56.216
1388
+ - type: recall_at_100
1389
+ value: 81.798
1390
+ - type: recall_at_1000
1391
+ value: 95.48299999999999
1392
+ - type: recall_at_3
1393
+ value: 39.357
1394
+ - type: recall_at_5
1395
+ value: 47.802
1396
+ task:
1397
+ type: Retrieval
1398
+ - dataset:
1399
+ config: default
1400
+ name: MTEB HotpotQA
1401
+ revision: ab518f4d6fcca38d87c25209f94beba119d02014
1402
+ split: test
1403
+ type: mteb/hotpotqa
1404
+ metrics:
1405
+ - type: map_at_1
1406
+ value: 42.728
1407
+ - type: map_at_10
1408
+ value: 57.737
1409
+ - type: map_at_100
1410
+ value: 58.531
1411
+ - type: map_at_1000
1412
+ value: 58.594
1413
+ - type: map_at_3
1414
+ value: 54.869
1415
+ - type: map_at_5
1416
+ value: 56.55
1417
+ - type: mrr_at_1
1418
+ value: 85.456
1419
+ - type: mrr_at_10
1420
+ value: 90.062
1421
+ - type: mrr_at_100
1422
+ value: 90.159
1423
+ - type: mrr_at_1000
1424
+ value: 90.16
1425
+ - type: mrr_at_3
1426
+ value: 89.37899999999999
1427
+ - type: mrr_at_5
1428
+ value: 89.81
1429
+ - type: ndcg_at_1
1430
+ value: 85.456
1431
+ - type: ndcg_at_10
1432
+ value: 67.755
1433
+ - type: ndcg_at_100
1434
+ value: 70.341
1435
+ - type: ndcg_at_1000
1436
+ value: 71.538
1437
+ - type: ndcg_at_3
1438
+ value: 63.735
1439
+ - type: ndcg_at_5
1440
+ value: 65.823
1441
+ - type: precision_at_1
1442
+ value: 85.456
1443
+ - type: precision_at_10
1444
+ value: 13.450000000000001
1445
+ - type: precision_at_100
1446
+ value: 1.545
1447
+ - type: precision_at_1000
1448
+ value: 0.16999999999999998
1449
+ - type: precision_at_3
1450
+ value: 38.861000000000004
1451
+ - type: precision_at_5
1452
+ value: 24.964
1453
+ - type: recall_at_1
1454
+ value: 42.728
1455
+ - type: recall_at_10
1456
+ value: 67.252
1457
+ - type: recall_at_100
1458
+ value: 77.265
1459
+ - type: recall_at_1000
1460
+ value: 85.246
1461
+ - type: recall_at_3
1462
+ value: 58.292
1463
+ - type: recall_at_5
1464
+ value: 62.41100000000001
1465
+ task:
1466
+ type: Retrieval
1467
+ - dataset:
1468
+ config: default
1469
+ name: MTEB ImdbClassification
1470
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1471
+ split: test
1472
+ type: mteb/imdb
1473
+ metrics:
1474
+ - type: accuracy
1475
+ value: 87.4836
1476
+ - type: ap
1477
+ value: 82.29552224030336
1478
+ - type: f1
1479
+ value: 87.42791432227448
1480
+ task:
1481
+ type: Classification
1482
+ - dataset:
1483
+ config: default
1484
+ name: MTEB MSMARCO
1485
+ revision: c5a29a104738b98a9e76336939199e264163d4a0
1486
+ split: dev
1487
+ type: mteb/msmarco
1488
+ metrics:
1489
+ - type: map_at_1
1490
+ value: 23.015
1491
+ - type: map_at_10
1492
+ value: 35.621
1493
+ - type: map_at_100
1494
+ value: 36.809
1495
+ - type: map_at_1000
1496
+ value: 36.853
1497
+ - type: map_at_3
1498
+ value: 31.832
1499
+ - type: map_at_5
1500
+ value: 34.006
1501
+ - type: mrr_at_1
1502
+ value: 23.738999999999997
1503
+ - type: mrr_at_10
1504
+ value: 36.309999999999995
1505
+ - type: mrr_at_100
1506
+ value: 37.422
1507
+ - type: mrr_at_1000
1508
+ value: 37.461
1509
+ - type: mrr_at_3
1510
+ value: 32.592999999999996
1511
+ - type: mrr_at_5
1512
+ value: 34.736
1513
+ - type: ndcg_at_1
1514
+ value: 23.724999999999998
1515
+ - type: ndcg_at_10
1516
+ value: 42.617
1517
+ - type: ndcg_at_100
1518
+ value: 48.217999999999996
1519
+ - type: ndcg_at_1000
1520
+ value: 49.309
1521
+ - type: ndcg_at_3
1522
+ value: 34.905
1523
+ - type: ndcg_at_5
1524
+ value: 38.769
1525
+ - type: precision_at_1
1526
+ value: 23.724999999999998
1527
+ - type: precision_at_10
1528
+ value: 6.689
1529
+ - type: precision_at_100
1530
+ value: 0.9480000000000001
1531
+ - type: precision_at_1000
1532
+ value: 0.104
1533
+ - type: precision_at_3
1534
+ value: 14.89
1535
+ - type: precision_at_5
1536
+ value: 10.897
1537
+ - type: recall_at_1
1538
+ value: 23.015
1539
+ - type: recall_at_10
1540
+ value: 64.041
1541
+ - type: recall_at_100
1542
+ value: 89.724
1543
+ - type: recall_at_1000
1544
+ value: 98.00999999999999
1545
+ - type: recall_at_3
1546
+ value: 43.064
1547
+ - type: recall_at_5
1548
+ value: 52.31099999999999
1549
+ task:
1550
+ type: Retrieval
1551
+ - dataset:
1552
+ config: en
1553
+ name: MTEB MTOPDomainClassification (en)
1554
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1555
+ split: test
1556
+ type: mteb/mtop_domain
1557
+ metrics:
1558
+ - type: accuracy
1559
+ value: 96.49794801641588
1560
+ - type: f1
1561
+ value: 96.28931114498003
1562
+ task:
1563
+ type: Classification
1564
+ - dataset:
1565
+ config: en
1566
+ name: MTEB MTOPIntentClassification (en)
1567
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1568
+ split: test
1569
+ type: mteb/mtop_intent
1570
+ metrics:
1571
+ - type: accuracy
1572
+ value: 82.81121751025992
1573
+ - type: f1
1574
+ value: 63.18740125901853
1575
+ task:
1576
+ type: Classification
1577
+ - dataset:
1578
+ config: en
1579
+ name: MTEB MassiveIntentClassification (en)
1580
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1581
+ split: test
1582
+ type: mteb/amazon_massive_intent
1583
+ metrics:
1584
+ - type: accuracy
1585
+ value: 77.66644250168123
1586
+ - type: f1
1587
+ value: 74.93211186867839
1588
+ task:
1589
+ type: Classification
1590
+ - dataset:
1591
+ config: en
1592
+ name: MTEB MassiveScenarioClassification (en)
1593
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1594
+ split: test
1595
+ type: mteb/amazon_massive_scenario
1596
+ metrics:
1597
+ - type: accuracy
1598
+ value: 81.77202420981843
1599
+ - type: f1
1600
+ value: 81.63681969283554
1601
+ task:
1602
+ type: Classification
1603
+ - dataset:
1604
+ config: default
1605
+ name: MTEB MedrxivClusteringP2P
1606
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1607
+ split: test
1608
+ type: mteb/medrxiv-clustering-p2p
1609
+ metrics:
1610
+ - type: v_measure
1611
+ value: 34.596687684870645
1612
+ task:
1613
+ type: Clustering
1614
+ - dataset:
1615
+ config: default
1616
+ name: MTEB MedrxivClusteringS2S
1617
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1618
+ split: test
1619
+ type: mteb/medrxiv-clustering-s2s
1620
+ metrics:
1621
+ - type: v_measure
1622
+ value: 32.26965660101405
1623
+ task:
1624
+ type: Clustering
1625
+ - dataset:
1626
+ config: default
1627
+ name: MTEB MindSmallReranking
1628
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1629
+ split: test
1630
+ type: mteb/mind_small
1631
+ metrics:
1632
+ - type: map
1633
+ value: 31.33619694846802
1634
+ - type: mrr
1635
+ value: 32.53719657720334
1636
+ task:
1637
+ type: Reranking
1638
+ - dataset:
1639
+ config: default
1640
+ name: MTEB NFCorpus
1641
+ revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
1642
+ split: test
1643
+ type: mteb/nfcorpus
1644
+ metrics:
1645
+ - type: map_at_1
1646
+ value: 6.0729999999999995
1647
+ - type: map_at_10
1648
+ value: 13.245999999999999
1649
+ - type: map_at_100
1650
+ value: 16.747999999999998
1651
+ - type: map_at_1000
1652
+ value: 18.163
1653
+ - type: map_at_3
1654
+ value: 10.064
1655
+ - type: map_at_5
1656
+ value: 11.513
1657
+ - type: mrr_at_1
1658
+ value: 49.536
1659
+ - type: mrr_at_10
1660
+ value: 58.092
1661
+ - type: mrr_at_100
1662
+ value: 58.752
1663
+ - type: mrr_at_1000
1664
+ value: 58.78
1665
+ - type: mrr_at_3
1666
+ value: 56.398
1667
+ - type: mrr_at_5
1668
+ value: 57.389
1669
+ - type: ndcg_at_1
1670
+ value: 47.059
1671
+ - type: ndcg_at_10
1672
+ value: 35.881
1673
+ - type: ndcg_at_100
1674
+ value: 32.751999999999995
1675
+ - type: ndcg_at_1000
1676
+ value: 41.498000000000005
1677
+ - type: ndcg_at_3
1678
+ value: 42.518
1679
+ - type: ndcg_at_5
1680
+ value: 39.550999999999995
1681
+ - type: precision_at_1
1682
+ value: 49.536
1683
+ - type: precision_at_10
1684
+ value: 26.316
1685
+ - type: precision_at_100
1686
+ value: 8.084
1687
+ - type: precision_at_1000
1688
+ value: 2.081
1689
+ - type: precision_at_3
1690
+ value: 39.938
1691
+ - type: precision_at_5
1692
+ value: 34.056
1693
+ - type: recall_at_1
1694
+ value: 6.0729999999999995
1695
+ - type: recall_at_10
1696
+ value: 16.593
1697
+ - type: recall_at_100
1698
+ value: 32.883
1699
+ - type: recall_at_1000
1700
+ value: 64.654
1701
+ - type: recall_at_3
1702
+ value: 11.174000000000001
1703
+ - type: recall_at_5
1704
+ value: 13.528
1705
+ task:
1706
+ type: Retrieval
1707
+ - dataset:
1708
+ config: default
1709
+ name: MTEB NQ
1710
+ revision: b774495ed302d8c44a3a7ea25c90dbce03968f31
1711
+ split: test
1712
+ type: mteb/nq
1713
+ metrics:
1714
+ - type: map_at_1
1715
+ value: 30.043
1716
+ - type: map_at_10
1717
+ value: 45.318999999999996
1718
+ - type: map_at_100
1719
+ value: 46.381
1720
+ - type: map_at_1000
1721
+ value: 46.412
1722
+ - type: map_at_3
1723
+ value: 40.941
1724
+ - type: map_at_5
1725
+ value: 43.662
1726
+ - type: mrr_at_1
1727
+ value: 33.98
1728
+ - type: mrr_at_10
1729
+ value: 47.870000000000005
1730
+ - type: mrr_at_100
1731
+ value: 48.681999999999995
1732
+ - type: mrr_at_1000
1733
+ value: 48.703
1734
+ - type: mrr_at_3
1735
+ value: 44.341
1736
+ - type: mrr_at_5
1737
+ value: 46.547
1738
+ - type: ndcg_at_1
1739
+ value: 33.98
1740
+ - type: ndcg_at_10
1741
+ value: 52.957
1742
+ - type: ndcg_at_100
1743
+ value: 57.434
1744
+ - type: ndcg_at_1000
1745
+ value: 58.103
1746
+ - type: ndcg_at_3
1747
+ value: 44.896
1748
+ - type: ndcg_at_5
1749
+ value: 49.353
1750
+ - type: precision_at_1
1751
+ value: 33.98
1752
+ - type: precision_at_10
1753
+ value: 8.786
1754
+ - type: precision_at_100
1755
+ value: 1.1280000000000001
1756
+ - type: precision_at_1000
1757
+ value: 0.11900000000000001
1758
+ - type: precision_at_3
1759
+ value: 20.577
1760
+ - type: precision_at_5
1761
+ value: 14.942
1762
+ - type: recall_at_1
1763
+ value: 30.043
1764
+ - type: recall_at_10
1765
+ value: 73.593
1766
+ - type: recall_at_100
1767
+ value: 93.026
1768
+ - type: recall_at_1000
1769
+ value: 97.943
1770
+ - type: recall_at_3
1771
+ value: 52.955
1772
+ - type: recall_at_5
1773
+ value: 63.132
1774
+ task:
1775
+ type: Retrieval
1776
+ - dataset:
1777
+ config: default
1778
+ name: MTEB QuoraRetrieval
1779
+ revision: None
1780
+ split: test
1781
+ type: mteb/quora
1782
+ metrics:
1783
+ - type: map_at_1
1784
+ value: 70.808
1785
+ - type: map_at_10
1786
+ value: 84.675
1787
+ - type: map_at_100
1788
+ value: 85.322
1789
+ - type: map_at_1000
1790
+ value: 85.33800000000001
1791
+ - type: map_at_3
1792
+ value: 81.68900000000001
1793
+ - type: map_at_5
1794
+ value: 83.543
1795
+ - type: mrr_at_1
1796
+ value: 81.5
1797
+ - type: mrr_at_10
1798
+ value: 87.59700000000001
1799
+ - type: mrr_at_100
1800
+ value: 87.705
1801
+ - type: mrr_at_1000
1802
+ value: 87.70599999999999
1803
+ - type: mrr_at_3
1804
+ value: 86.607
1805
+ - type: mrr_at_5
1806
+ value: 87.289
1807
+ - type: ndcg_at_1
1808
+ value: 81.51
1809
+ - type: ndcg_at_10
1810
+ value: 88.41799999999999
1811
+ - type: ndcg_at_100
1812
+ value: 89.644
1813
+ - type: ndcg_at_1000
1814
+ value: 89.725
1815
+ - type: ndcg_at_3
1816
+ value: 85.49900000000001
1817
+ - type: ndcg_at_5
1818
+ value: 87.078
1819
+ - type: precision_at_1
1820
+ value: 81.51
1821
+ - type: precision_at_10
1822
+ value: 13.438
1823
+ - type: precision_at_100
1824
+ value: 1.532
1825
+ - type: precision_at_1000
1826
+ value: 0.157
1827
+ - type: precision_at_3
1828
+ value: 37.363
1829
+ - type: precision_at_5
1830
+ value: 24.57
1831
+ - type: recall_at_1
1832
+ value: 70.808
1833
+ - type: recall_at_10
1834
+ value: 95.575
1835
+ - type: recall_at_100
1836
+ value: 99.667
1837
+ - type: recall_at_1000
1838
+ value: 99.98899999999999
1839
+ - type: recall_at_3
1840
+ value: 87.223
1841
+ - type: recall_at_5
1842
+ value: 91.682
1843
+ task:
1844
+ type: Retrieval
1845
+ - dataset:
1846
+ config: default
1847
+ name: MTEB RedditClustering
1848
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1849
+ split: test
1850
+ type: mteb/reddit-clustering
1851
+ metrics:
1852
+ - type: v_measure
1853
+ value: 58.614831329137715
1854
+ task:
1855
+ type: Clustering
1856
+ - dataset:
1857
+ config: default
1858
+ name: MTEB RedditClusteringP2P
1859
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1860
+ split: test
1861
+ type: mteb/reddit-clustering-p2p
1862
+ metrics:
1863
+ - type: v_measure
1864
+ value: 66.86580408560826
1865
+ task:
1866
+ type: Clustering
1867
+ - dataset:
1868
+ config: default
1869
+ name: MTEB SCIDOCS
1870
+ revision: None
1871
+ split: test
1872
+ type: mteb/scidocs
1873
+ metrics:
1874
+ - type: map_at_1
1875
+ value: 5.093
1876
+ - type: map_at_10
1877
+ value: 13.014000000000001
1878
+ - type: map_at_100
1879
+ value: 15.412999999999998
1880
+ - type: map_at_1000
1881
+ value: 15.756999999999998
1882
+ - type: map_at_3
1883
+ value: 9.216000000000001
1884
+ - type: map_at_5
1885
+ value: 11.036999999999999
1886
+ - type: mrr_at_1
1887
+ value: 25.1
1888
+ - type: mrr_at_10
1889
+ value: 37.133
1890
+ - type: mrr_at_100
1891
+ value: 38.165
1892
+ - type: mrr_at_1000
1893
+ value: 38.198
1894
+ - type: mrr_at_3
1895
+ value: 33.217
1896
+ - type: mrr_at_5
1897
+ value: 35.732
1898
+ - type: ndcg_at_1
1899
+ value: 25.1
1900
+ - type: ndcg_at_10
1901
+ value: 21.918000000000003
1902
+ - type: ndcg_at_100
1903
+ value: 30.983
1904
+ - type: ndcg_at_1000
1905
+ value: 36.629
1906
+ - type: ndcg_at_3
1907
+ value: 20.544999999999998
1908
+ - type: ndcg_at_5
1909
+ value: 18.192
1910
+ - type: precision_at_1
1911
+ value: 25.1
1912
+ - type: precision_at_10
1913
+ value: 11.44
1914
+ - type: precision_at_100
1915
+ value: 2.459
1916
+ - type: precision_at_1000
1917
+ value: 0.381
1918
+ - type: precision_at_3
1919
+ value: 19.267
1920
+ - type: precision_at_5
1921
+ value: 16.16
1922
+ - type: recall_at_1
1923
+ value: 5.093
1924
+ - type: recall_at_10
1925
+ value: 23.215
1926
+ - type: recall_at_100
1927
+ value: 49.902
1928
+ - type: recall_at_1000
1929
+ value: 77.403
1930
+ - type: recall_at_3
1931
+ value: 11.733
1932
+ - type: recall_at_5
1933
+ value: 16.372999999999998
1934
+ task:
1935
+ type: Retrieval
1936
+ - dataset:
1937
+ config: default
1938
+ name: MTEB SICK-R
1939
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1940
+ split: test
1941
+ type: mteb/sickr-sts
1942
+ metrics:
1943
+ - type: cos_sim_pearson
1944
+ value: 82.9365442977452
1945
+ - type: cos_sim_spearman
1946
+ value: 79.36960687383745
1947
+ - type: euclidean_pearson
1948
+ value: 79.6045204840714
1949
+ - type: euclidean_spearman
1950
+ value: 79.26382712751337
1951
+ - type: manhattan_pearson
1952
+ value: 79.4805084789529
1953
+ - type: manhattan_spearman
1954
+ value: 79.21847863209523
1955
+ task:
1956
+ type: STS
1957
+ - dataset:
1958
+ config: default
1959
+ name: MTEB STS12
1960
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1961
+ split: test
1962
+ type: mteb/sts12-sts
1963
+ metrics:
1964
+ - type: cos_sim_pearson
1965
+ value: 83.27906192961453
1966
+ - type: cos_sim_spearman
1967
+ value: 74.38364712099211
1968
+ - type: euclidean_pearson
1969
+ value: 78.54358927241223
1970
+ - type: euclidean_spearman
1971
+ value: 74.22185560806376
1972
+ - type: manhattan_pearson
1973
+ value: 78.50904327377751
1974
+ - type: manhattan_spearman
1975
+ value: 74.2627500781748
1976
+ task:
1977
+ type: STS
1978
+ - dataset:
1979
+ config: default
1980
+ name: MTEB STS13
1981
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1982
+ split: test
1983
+ type: mteb/sts13-sts
1984
+ metrics:
1985
+ - type: cos_sim_pearson
1986
+ value: 84.66863742649639
1987
+ - type: cos_sim_spearman
1988
+ value: 84.70630905216271
1989
+ - type: euclidean_pearson
1990
+ value: 84.64498334705334
1991
+ - type: euclidean_spearman
1992
+ value: 84.87204770690148
1993
+ - type: manhattan_pearson
1994
+ value: 84.65774227976077
1995
+ - type: manhattan_spearman
1996
+ value: 84.91251851797985
1997
+ task:
1998
+ type: STS
1999
+ - dataset:
2000
+ config: default
2001
+ name: MTEB STS14
2002
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2003
+ split: test
2004
+ type: mteb/sts14-sts
2005
+ metrics:
2006
+ - type: cos_sim_pearson
2007
+ value: 83.1577763924467
2008
+ - type: cos_sim_spearman
2009
+ value: 80.10314039230198
2010
+ - type: euclidean_pearson
2011
+ value: 81.51346991046043
2012
+ - type: euclidean_spearman
2013
+ value: 80.08678485109435
2014
+ - type: manhattan_pearson
2015
+ value: 81.57058914661894
2016
+ - type: manhattan_spearman
2017
+ value: 80.1516230725106
2018
+ task:
2019
+ type: STS
2020
+ - dataset:
2021
+ config: default
2022
+ name: MTEB STS15
2023
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2024
+ split: test
2025
+ type: mteb/sts15-sts
2026
+ metrics:
2027
+ - type: cos_sim_pearson
2028
+ value: 86.40310839662533
2029
+ - type: cos_sim_spearman
2030
+ value: 87.16293477217867
2031
+ - type: euclidean_pearson
2032
+ value: 86.50688711184775
2033
+ - type: euclidean_spearman
2034
+ value: 87.08651444923031
2035
+ - type: manhattan_pearson
2036
+ value: 86.54674677557857
2037
+ - type: manhattan_spearman
2038
+ value: 87.15079017870971
2039
+ task:
2040
+ type: STS
2041
+ - dataset:
2042
+ config: default
2043
+ name: MTEB STS16
2044
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2045
+ split: test
2046
+ type: mteb/sts16-sts
2047
+ metrics:
2048
+ - type: cos_sim_pearson
2049
+ value: 84.32886275207817
2050
+ - type: cos_sim_spearman
2051
+ value: 85.0190460590732
2052
+ - type: euclidean_pearson
2053
+ value: 84.42553652784679
2054
+ - type: euclidean_spearman
2055
+ value: 85.20027364279328
2056
+ - type: manhattan_pearson
2057
+ value: 84.42926246281078
2058
+ - type: manhattan_spearman
2059
+ value: 85.20187419804306
2060
+ task:
2061
+ type: STS
2062
+ - dataset:
2063
+ config: en-en
2064
+ name: MTEB STS17 (en-en)
2065
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2066
+ split: test
2067
+ type: mteb/sts17-crosslingual-sts
2068
+ metrics:
2069
+ - type: cos_sim_pearson
2070
+ value: 90.76732216967812
2071
+ - type: cos_sim_spearman
2072
+ value: 90.63701653633909
2073
+ - type: euclidean_pearson
2074
+ value: 90.26678186114682
2075
+ - type: euclidean_spearman
2076
+ value: 90.67288073455427
2077
+ - type: manhattan_pearson
2078
+ value: 90.20772020584582
2079
+ - type: manhattan_spearman
2080
+ value: 90.60764863983702
2081
+ task:
2082
+ type: STS
2083
+ - dataset:
2084
+ config: en
2085
+ name: MTEB STS22 (en)
2086
+ revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2087
+ split: test
2088
+ type: mteb/sts22-crosslingual-sts
2089
+ metrics:
2090
+ - type: cos_sim_pearson
2091
+ value: 69.09280387698125
2092
+ - type: cos_sim_spearman
2093
+ value: 68.62743151172162
2094
+ - type: euclidean_pearson
2095
+ value: 69.89386398104689
2096
+ - type: euclidean_spearman
2097
+ value: 68.71191066733556
2098
+ - type: manhattan_pearson
2099
+ value: 69.92516500604872
2100
+ - type: manhattan_spearman
2101
+ value: 68.80452846992576
2102
+ task:
2103
+ type: STS
2104
+ - dataset:
2105
+ config: default
2106
+ name: MTEB STSBenchmark
2107
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2108
+ split: test
2109
+ type: mteb/stsbenchmark-sts
2110
+ metrics:
2111
+ - type: cos_sim_pearson
2112
+ value: 86.13178592019887
2113
+ - type: cos_sim_spearman
2114
+ value: 86.03947178806887
2115
+ - type: euclidean_pearson
2116
+ value: 85.87029414285313
2117
+ - type: euclidean_spearman
2118
+ value: 86.04960843306998
2119
+ - type: manhattan_pearson
2120
+ value: 85.92946858580146
2121
+ - type: manhattan_spearman
2122
+ value: 86.12575341860442
2123
+ task:
2124
+ type: STS
2125
+ - dataset:
2126
+ config: default
2127
+ name: MTEB SciDocsRR
2128
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2129
+ split: test
2130
+ type: mteb/scidocs-reranking
2131
+ metrics:
2132
+ - type: map
2133
+ value: 85.16657063002837
2134
+ - type: mrr
2135
+ value: 95.73671063867141
2136
+ task:
2137
+ type: Reranking
2138
+ - dataset:
2139
+ config: default
2140
+ name: MTEB SciFact
2141
+ revision: 0228b52cf27578f30900b9e5271d331663a030d7
2142
+ split: test
2143
+ type: mteb/scifact
2144
+ metrics:
2145
+ - type: map_at_1
2146
+ value: 63.510999999999996
2147
+ - type: map_at_10
2148
+ value: 72.76899999999999
2149
+ - type: map_at_100
2150
+ value: 73.303
2151
+ - type: map_at_1000
2152
+ value: 73.32499999999999
2153
+ - type: map_at_3
2154
+ value: 70.514
2155
+ - type: map_at_5
2156
+ value: 71.929
2157
+ - type: mrr_at_1
2158
+ value: 66.333
2159
+ - type: mrr_at_10
2160
+ value: 73.75
2161
+ - type: mrr_at_100
2162
+ value: 74.119
2163
+ - type: mrr_at_1000
2164
+ value: 74.138
2165
+ - type: mrr_at_3
2166
+ value: 72.222
2167
+ - type: mrr_at_5
2168
+ value: 73.122
2169
+ - type: ndcg_at_1
2170
+ value: 66.333
2171
+ - type: ndcg_at_10
2172
+ value: 76.774
2173
+ - type: ndcg_at_100
2174
+ value: 78.78500000000001
2175
+ - type: ndcg_at_1000
2176
+ value: 79.254
2177
+ - type: ndcg_at_3
2178
+ value: 73.088
2179
+ - type: ndcg_at_5
2180
+ value: 75.002
2181
+ - type: precision_at_1
2182
+ value: 66.333
2183
+ - type: precision_at_10
2184
+ value: 9.833
2185
+ - type: precision_at_100
2186
+ value: 1.093
2187
+ - type: precision_at_1000
2188
+ value: 0.11299999999999999
2189
+ - type: precision_at_3
2190
+ value: 28.222
2191
+ - type: precision_at_5
2192
+ value: 18.333
2193
+ - type: recall_at_1
2194
+ value: 63.510999999999996
2195
+ - type: recall_at_10
2196
+ value: 87.98899999999999
2197
+ - type: recall_at_100
2198
+ value: 96.5
2199
+ - type: recall_at_1000
2200
+ value: 100.0
2201
+ - type: recall_at_3
2202
+ value: 77.86699999999999
2203
+ - type: recall_at_5
2204
+ value: 82.73899999999999
2205
+ task:
2206
+ type: Retrieval
2207
+ - dataset:
2208
+ config: default
2209
+ name: MTEB SprintDuplicateQuestions
2210
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2211
+ split: test
2212
+ type: mteb/sprintduplicatequestions-pairclassification
2213
+ metrics:
2214
+ - type: cos_sim_accuracy
2215
+ value: 99.78514851485149
2216
+ - type: cos_sim_ap
2217
+ value: 94.94214383862038
2218
+ - type: cos_sim_f1
2219
+ value: 89.02255639097744
2220
+ - type: cos_sim_precision
2221
+ value: 89.2462311557789
2222
+ - type: cos_sim_recall
2223
+ value: 88.8
2224
+ - type: dot_accuracy
2225
+ value: 99.78217821782178
2226
+ - type: dot_ap
2227
+ value: 94.69965247836805
2228
+ - type: dot_f1
2229
+ value: 88.78695208970439
2230
+ - type: dot_precision
2231
+ value: 90.54054054054053
2232
+ - type: dot_recall
2233
+ value: 87.1
2234
+ - type: euclidean_accuracy
2235
+ value: 99.78118811881188
2236
+ - type: euclidean_ap
2237
+ value: 94.9865187695411
2238
+ - type: euclidean_f1
2239
+ value: 88.99950223992036
2240
+ - type: euclidean_precision
2241
+ value: 88.60257680872151
2242
+ - type: euclidean_recall
2243
+ value: 89.4
2244
+ - type: manhattan_accuracy
2245
+ value: 99.78811881188119
2246
+ - type: manhattan_ap
2247
+ value: 95.0021236766459
2248
+ - type: manhattan_f1
2249
+ value: 89.12071535022356
2250
+ - type: manhattan_precision
2251
+ value: 88.54886475814413
2252
+ - type: manhattan_recall
2253
+ value: 89.7
2254
+ - type: max_accuracy
2255
+ value: 99.78811881188119
2256
+ - type: max_ap
2257
+ value: 95.0021236766459
2258
+ - type: max_f1
2259
+ value: 89.12071535022356
2260
+ task:
2261
+ type: PairClassification
2262
+ - dataset:
2263
+ config: default
2264
+ name: MTEB StackExchangeClustering
2265
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2266
+ split: test
2267
+ type: mteb/stackexchange-clustering
2268
+ metrics:
2269
+ - type: v_measure
2270
+ value: 68.93190546593995
2271
+ task:
2272
+ type: Clustering
2273
+ - dataset:
2274
+ config: default
2275
+ name: MTEB StackExchangeClusteringP2P
2276
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2277
+ split: test
2278
+ type: mteb/stackexchange-clustering-p2p
2279
+ metrics:
2280
+ - type: v_measure
2281
+ value: 37.602808534760655
2282
+ task:
2283
+ type: Clustering
2284
+ - dataset:
2285
+ config: default
2286
+ name: MTEB StackOverflowDupQuestions
2287
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2288
+ split: test
2289
+ type: mteb/stackoverflowdupquestions-reranking
2290
+ metrics:
2291
+ - type: map
2292
+ value: 52.29214480978073
2293
+ - type: mrr
2294
+ value: 53.123169722434426
2295
+ task:
2296
+ type: Reranking
2297
+ - dataset:
2298
+ config: default
2299
+ name: MTEB SummEval
2300
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2301
+ split: test
2302
+ type: mteb/summeval
2303
+ metrics:
2304
+ - type: cos_sim_pearson
2305
+ value: 30.967800769650022
2306
+ - type: cos_sim_spearman
2307
+ value: 31.168490040206926
2308
+ - type: dot_pearson
2309
+ value: 30.888603021128553
2310
+ - type: dot_spearman
2311
+ value: 31.028241262520385
2312
+ task:
2313
+ type: Summarization
2314
+ - dataset:
2315
+ config: default
2316
+ name: MTEB TRECCOVID
2317
+ revision: None
2318
+ split: test
2319
+ type: mteb/trec-covid
2320
+ metrics:
2321
+ - type: map_at_1
2322
+ value: 0.22300000000000003
2323
+ - type: map_at_10
2324
+ value: 1.781
2325
+ - type: map_at_100
2326
+ value: 9.905999999999999
2327
+ - type: map_at_1000
2328
+ value: 23.455000000000002
2329
+ - type: map_at_3
2330
+ value: 0.569
2331
+ - type: map_at_5
2332
+ value: 0.918
2333
+ - type: mrr_at_1
2334
+ value: 84.0
2335
+ - type: mrr_at_10
2336
+ value: 91.067
2337
+ - type: mrr_at_100
2338
+ value: 91.067
2339
+ - type: mrr_at_1000
2340
+ value: 91.067
2341
+ - type: mrr_at_3
2342
+ value: 90.667
2343
+ - type: mrr_at_5
2344
+ value: 91.067
2345
+ - type: ndcg_at_1
2346
+ value: 78.0
2347
+ - type: ndcg_at_10
2348
+ value: 73.13499999999999
2349
+ - type: ndcg_at_100
2350
+ value: 55.32
2351
+ - type: ndcg_at_1000
2352
+ value: 49.532
2353
+ - type: ndcg_at_3
2354
+ value: 73.715
2355
+ - type: ndcg_at_5
2356
+ value: 72.74199999999999
2357
+ - type: precision_at_1
2358
+ value: 84.0
2359
+ - type: precision_at_10
2360
+ value: 78.8
2361
+ - type: precision_at_100
2362
+ value: 56.32
2363
+ - type: precision_at_1000
2364
+ value: 21.504
2365
+ - type: precision_at_3
2366
+ value: 77.333
2367
+ - type: precision_at_5
2368
+ value: 78.0
2369
+ - type: recall_at_1
2370
+ value: 0.22300000000000003
2371
+ - type: recall_at_10
2372
+ value: 2.049
2373
+ - type: recall_at_100
2374
+ value: 13.553
2375
+ - type: recall_at_1000
2376
+ value: 46.367999999999995
2377
+ - type: recall_at_3
2378
+ value: 0.604
2379
+ - type: recall_at_5
2380
+ value: 1.015
2381
+ task:
2382
+ type: Retrieval
2383
+ - dataset:
2384
+ config: default
2385
+ name: MTEB Touche2020
2386
+ revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f
2387
+ split: test
2388
+ type: mteb/touche2020
2389
+ metrics:
2390
+ - type: map_at_1
2391
+ value: 3.0380000000000003
2392
+ - type: map_at_10
2393
+ value: 10.188
2394
+ - type: map_at_100
2395
+ value: 16.395
2396
+ - type: map_at_1000
2397
+ value: 18.024
2398
+ - type: map_at_3
2399
+ value: 6.236
2400
+ - type: map_at_5
2401
+ value: 7.276000000000001
2402
+ - type: mrr_at_1
2403
+ value: 34.694
2404
+ - type: mrr_at_10
2405
+ value: 46.292
2406
+ - type: mrr_at_100
2407
+ value: 47.446
2408
+ - type: mrr_at_1000
2409
+ value: 47.446
2410
+ - type: mrr_at_3
2411
+ value: 41.156
2412
+ - type: mrr_at_5
2413
+ value: 44.32
2414
+ - type: ndcg_at_1
2415
+ value: 32.653
2416
+ - type: ndcg_at_10
2417
+ value: 25.219
2418
+ - type: ndcg_at_100
2419
+ value: 37.802
2420
+ - type: ndcg_at_1000
2421
+ value: 49.274
2422
+ - type: ndcg_at_3
2423
+ value: 28.605999999999998
2424
+ - type: ndcg_at_5
2425
+ value: 26.21
2426
+ - type: precision_at_1
2427
+ value: 34.694
2428
+ - type: precision_at_10
2429
+ value: 21.837
2430
+ - type: precision_at_100
2431
+ value: 7.776
2432
+ - type: precision_at_1000
2433
+ value: 1.522
2434
+ - type: precision_at_3
2435
+ value: 28.571
2436
+ - type: precision_at_5
2437
+ value: 25.306
2438
+ - type: recall_at_1
2439
+ value: 3.0380000000000003
2440
+ - type: recall_at_10
2441
+ value: 16.298000000000002
2442
+ - type: recall_at_100
2443
+ value: 48.712
2444
+ - type: recall_at_1000
2445
+ value: 83.16799999999999
2446
+ - type: recall_at_3
2447
+ value: 7.265000000000001
2448
+ - type: recall_at_5
2449
+ value: 9.551
2450
+ task:
2451
+ type: Retrieval
2452
+ - dataset:
2453
+ config: default
2454
+ name: MTEB ToxicConversationsClassification
2455
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2456
+ split: test
2457
+ type: mteb/toxic_conversations_50k
2458
+ metrics:
2459
+ - type: accuracy
2460
+ value: 83.978
2461
+ - type: ap
2462
+ value: 24.751887949330015
2463
+ - type: f1
2464
+ value: 66.8685134049279
2465
+ task:
2466
+ type: Classification
2467
+ - dataset:
2468
+ config: default
2469
+ name: MTEB TweetSentimentExtractionClassification
2470
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2471
+ split: test
2472
+ type: mteb/tweet_sentiment_extraction
2473
+ metrics:
2474
+ - type: accuracy
2475
+ value: 61.573288058856825
2476
+ - type: f1
2477
+ value: 61.973261751726604
2478
+ task:
2479
+ type: Classification
2480
+ - dataset:
2481
+ config: default
2482
+ name: MTEB TwentyNewsgroupsClustering
2483
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2484
+ split: test
2485
+ type: mteb/twentynewsgroups-clustering
2486
+ metrics:
2487
+ - type: v_measure
2488
+ value: 48.75483298792469
2489
+ task:
2490
+ type: Clustering
2491
+ - dataset:
2492
+ config: default
2493
+ name: MTEB TwitterSemEval2015
2494
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2495
+ split: test
2496
+ type: mteb/twittersemeval2015-pairclassification
2497
+ metrics:
2498
+ - type: cos_sim_accuracy
2499
+ value: 86.36824223639506
2500
+ - type: cos_sim_ap
2501
+ value: 75.53126388573047
2502
+ - type: cos_sim_f1
2503
+ value: 67.9912831688245
2504
+ - type: cos_sim_precision
2505
+ value: 66.11817501869858
2506
+ - type: cos_sim_recall
2507
+ value: 69.9736147757256
2508
+ - type: dot_accuracy
2509
+ value: 86.39804494248078
2510
+ - type: dot_ap
2511
+ value: 75.27598891718046
2512
+ - type: dot_f1
2513
+ value: 67.91146284159763
2514
+ - type: dot_precision
2515
+ value: 63.90505003490807
2516
+ - type: dot_recall
2517
+ value: 72.45382585751979
2518
+ - type: euclidean_accuracy
2519
+ value: 86.36228169517793
2520
+ - type: euclidean_ap
2521
+ value: 75.51438087434647
2522
+ - type: euclidean_f1
2523
+ value: 68.02370523061066
2524
+ - type: euclidean_precision
2525
+ value: 66.46525679758308
2526
+ - type: euclidean_recall
2527
+ value: 69.65699208443272
2528
+ - type: manhattan_accuracy
2529
+ value: 86.46361089586935
2530
+ - type: manhattan_ap
2531
+ value: 75.50800785730111
2532
+ - type: manhattan_f1
2533
+ value: 67.9220437187253
2534
+ - type: manhattan_precision
2535
+ value: 67.79705573080967
2536
+ - type: manhattan_recall
2537
+ value: 68.04749340369392
2538
+ - type: max_accuracy
2539
+ value: 86.46361089586935
2540
+ - type: max_ap
2541
+ value: 75.53126388573047
2542
+ - type: max_f1
2543
+ value: 68.02370523061066
2544
+ task:
2545
+ type: PairClassification
2546
+ - dataset:
2547
+ config: default
2548
+ name: MTEB TwitterURLCorpus
2549
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2550
+ split: test
2551
+ type: mteb/twitterurlcorpus-pairclassification
2552
+ metrics:
2553
+ - type: cos_sim_accuracy
2554
+ value: 88.80350836341057
2555
+ - type: cos_sim_ap
2556
+ value: 85.51101933260743
2557
+ - type: cos_sim_f1
2558
+ value: 77.9152271629704
2559
+ - type: cos_sim_precision
2560
+ value: 75.27815662910056
2561
+ - type: cos_sim_recall
2562
+ value: 80.74376347397599
2563
+ - type: dot_accuracy
2564
+ value: 88.84425815966158
2565
+ - type: dot_ap
2566
+ value: 85.49726945962519
2567
+ - type: dot_f1
2568
+ value: 77.94445269567801
2569
+ - type: dot_precision
2570
+ value: 75.27251864601261
2571
+ - type: dot_recall
2572
+ value: 80.81305820757623
2573
+ - type: euclidean_accuracy
2574
+ value: 88.80350836341057
2575
+ - type: euclidean_ap
2576
+ value: 85.4882880790211
2577
+ - type: euclidean_f1
2578
+ value: 77.87063284615103
2579
+ - type: euclidean_precision
2580
+ value: 74.61022927689595
2581
+ - type: euclidean_recall
2582
+ value: 81.42901139513397
2583
+ - type: manhattan_accuracy
2584
+ value: 88.7161873714441
2585
+ - type: manhattan_ap
2586
+ value: 85.45753871906821
2587
+ - type: manhattan_f1
2588
+ value: 77.8686401480111
2589
+ - type: manhattan_precision
2590
+ value: 74.95903683123174
2591
+ - type: manhattan_recall
2592
+ value: 81.01324299353249
2593
+ - type: max_accuracy
2594
+ value: 88.84425815966158
2595
+ - type: max_ap
2596
+ value: 85.51101933260743
2597
+ - type: max_f1
2598
+ value: 77.94445269567801
2599
+ task:
2600
+ type: PairClassification
2601
+ tags:
2602
+ - sentence-transformers
2603
+ - gte
2604
+ - mteb
2605
+ - transformers.js
2606
+ - sentence-similarity
2607
+ - onnx
2608
+ - teradata
2609
+
2610
+ ---
2611
+ # A Teradata Vantage compatible Embeddings Model
2612
+
2613
+ # Alibaba-NLP/gte-base-en-v1.5
2614
+
2615
+ ## Overview of this Model
2616
+
2617
+ An Embedding Model which maps text (sentence/ paragraphs) into a vector. The [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) model well known for its effectiveness in capturing semantic meanings in text data. It's a state-of-the-art model trained on a large corpus, capable of generating high-quality text embeddings.
2618
+
2619
+ - 136.78M params (Sizes in ONNX format - "fp32": 530.23MB, "int8": 139.38MB, "uint8": 139.38MB)
2620
+ - 8192 maximum input tokens
2621
+ - 768 dimensions of output vector
2622
+ - Licence: apache-2.0. The released models can be used for commercial purposes free of charge.
2623
+ - Reference to Original Model: https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5
2624
+
2625
+
2626
+ ## Quickstart: Deploying this Model in Teradata Vantage
2627
+
2628
+ We have pre-converted the model into the ONNX format compatible with BYOM 6.0, eliminating the need for manual conversion.
2629
+
2630
+ **Note:** Ensure you have access to a Teradata Database with BYOM 6.0 installed.
2631
+
2632
+ To get started, clone the pre-converted model directly from the Teradata HuggingFace repository.
2633
+
2634
+
2635
+ ```python
2636
+
2637
+ import teradataml as tdml
2638
+ import getpass
2639
+ from huggingface_hub import hf_hub_download
2640
+
2641
+ model_name = "gte-base-en-v1.5"
2642
+ number_dimensions_output = 768
2643
+ model_file_name = "model.onnx"
2644
+
2645
+ # Step 1: Download Model from Teradata HuggingFace Page
2646
+
2647
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"onnx/{model_file_name}", local_dir="./")
2648
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"tokenizer.json", local_dir="./")
2649
+
2650
+ # Step 2: Create Connection to Vantage
2651
+
2652
+ tdml.create_context(host = input('enter your hostname'),
2653
+ username=input('enter your username'),
2654
+ password = getpass.getpass("enter your password"))
2655
+
2656
+ # Step 3: Load Models into Vantage
2657
+ # a) Embedding model
2658
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2659
+ model_file = model_file_name,
2660
+ table_name = 'embeddings_models' )
2661
+ # b) Tokenizer
2662
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2663
+ model_file = 'tokenizer.json',
2664
+ table_name = 'embeddings_tokenizers')
2665
+
2666
+ # Step 4: Test ONNXEmbeddings Function
2667
+ # Note that ONNXEmbeddings expects the 'payload' column to be 'txt'.
2668
+ # If it has got a different name, just rename it in a subquery/CTE.
2669
+ input_table = "emails.emails"
2670
+ embeddings_query = f"""
2671
+ SELECT
2672
+ *
2673
+ from mldb.ONNXEmbeddings(
2674
+ on {input_table} as InputTable
2675
+ on (select * from embeddings_models where model_id = '{model_name}') as ModelTable DIMENSION
2676
+ on (select model as tokenizer from embeddings_tokenizers where model_id = '{model_name}') as TokenizerTable DIMENSION
2677
+ using
2678
+ Accumulate('id', 'txt')
2679
+ ModelOutputTensor('sentence_embedding')
2680
+ EnableMemoryCheck('false')
2681
+ OutputFormat('FLOAT32({number_dimensions_output})')
2682
+ OverwriteCachedModel('true')
2683
+ ) a
2684
+ """
2685
+ DF_embeddings = tdml.DataFrame.from_query(embeddings_query)
2686
+ DF_embeddings
2687
+ ```
2688
+
2689
+
2690
+
2691
+ ## What Can I Do with the Embeddings?
2692
+
2693
+ Teradata Vantage includes pre-built in-database functions to process embeddings further. Explore the following examples:
2694
+
2695
+ - **Semantic Clustering with TD_KMeans:** [Semantic Clustering Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Clustering_Python.ipynb)
2696
+ - **Semantic Distance with TD_VectorDistance:** [Semantic Similarity Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Similarity_Python.ipynb)
2697
+ - **RAG-Based Application with TD_VectorDistance:** [RAG and Bedrock Query PDF Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/RAG_and_Bedrock_QueryPDF.ipynb)
2698
+
2699
+
2700
+ ## Deep Dive into Model Conversion to ONNX
2701
+
2702
+ **The steps below outline how we converted the open-source Hugging Face model into an ONNX file compatible with the in-database ONNXEmbeddings function.**
2703
+
2704
+ You do not need to perform these steps—they are provided solely for documentation and transparency. However, they may be helpful if you wish to convert another model to the required format.
2705
+
2706
+
2707
+ ### Part 1. Importing and Converting Model using optimum
2708
+
2709
+ We start by importing the pre-trained [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) model from Hugging Face.
2710
+
2711
+ To enhance performance and ensure compatibility with various execution environments, we'll use the [Optimum](https://github.com/huggingface/optimum) utility to convert the model into the ONNX (Open Neural Network Exchange) format.
2712
+
2713
+ After conversion to ONNX, we are fixing the opset in the ONNX file for compatibility with ONNX runtime used in Teradata Vantage
2714
+
2715
+ We are generating ONNX files for multiple different precisions: fp32, int8, uint8
2716
+
2717
+ You can find the detailed conversion steps in the file [convert.py](./convert.py)
2718
+
2719
+ ### Part 2. Running the model in Python with onnxruntime & compare results
2720
+
2721
+ Once the fixes are applied, we proceed to test the correctness of the ONNX model by calculating cosine similarity between two texts using native SentenceTransformers and ONNX runtime, comparing the results.
2722
+
2723
+ If the results are identical, it confirms that the ONNX model gives the same result as the native models, validating its correctness and suitability for further use in the database.
2724
+
2725
+
2726
+ ```python
2727
+ import onnxruntime as rt
2728
+
2729
+ from sentence_transformers.util import cos_sim
2730
+ from sentence_transformers import SentenceTransformer
2731
+
2732
+ import transformers
2733
+
2734
+
2735
+ sentences_1 = 'How is the weather today?'
2736
+ sentences_2 = 'What is the current weather like today?'
2737
+
2738
+ # Calculate ONNX result
2739
+ tokenizer = transformers.AutoTokenizer.from_pretrained("Alibaba-NLP/gte-base-en-v1.5")
2740
+ predef_sess = rt.InferenceSession("onnx/model.onnx")
2741
+
2742
+ enc1 = tokenizer(sentences_1)
2743
+ embeddings_1_onnx = predef_sess.run(None, {"input_ids": [enc1.input_ids],
2744
+ "attention_mask": [enc1.attention_mask]})
2745
+
2746
+ enc2 = tokenizer(sentences_2)
2747
+ embeddings_2_onnx = predef_sess.run(None, {"input_ids": [enc2.input_ids],
2748
+ "attention_mask": [enc2.attention_mask]})
2749
+
2750
+
2751
+ # Calculate embeddings with SentenceTransformer
2752
+ model = SentenceTransformer(model_id, trust_remote_code=True)
2753
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
2754
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
2755
+
2756
+ # Compare results
2757
+ print("Cosine similiarity for embeddings calculated with ONNX:" + str(cos_sim(embeddings_1_onnx[1][0], embeddings_2_onnx[1][0])))
2758
+ print("Cosine similiarity for embeddings calculated with SentenceTransformer:" + str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
2759
+ ```
2760
+
2761
+ You can find the detailed ONNX vs. SentenceTransformer result comparison steps in the file [test_local.py](./test_local.py)
2762
+
config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "_name_or_path": "Alibaba-NLP/gte-base-en-v1.5",
4
+ "architectures": [
5
+ "NewModel"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.0,
8
+ "auto_map": {
9
+ "AutoConfig": "Alibaba-NLP/new-impl--configuration.NewConfig",
10
+ "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
11
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
12
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
13
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
14
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
15
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
16
+ },
17
+ "classifier_dropout": null,
18
+ "export_model_type": "transformer",
19
+ "hidden_act": "gelu",
20
+ "hidden_dropout_prob": 0.1,
21
+ "hidden_size": 768,
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 3072,
24
+ "layer_norm_eps": 1e-12,
25
+ "layer_norm_type": "layer_norm",
26
+ "logn_attention_clip1": false,
27
+ "logn_attention_scale": false,
28
+ "max_position_embeddings": 8192,
29
+ "model_type": "new",
30
+ "num_attention_heads": 12,
31
+ "num_hidden_layers": 12,
32
+ "pack_qkv": true,
33
+ "pad_token_id": 0,
34
+ "position_embedding_type": "rope",
35
+ "rope_scaling": {
36
+ "factor": 2.0,
37
+ "type": "ntk"
38
+ },
39
+ "rope_theta": 500000,
40
+ "torch_dtype": "float32",
41
+ "transformers_version": "4.47.1",
42
+ "type_vocab_size": 0,
43
+ "unpad_inputs": false,
44
+ "use_memory_efficient_attention": false,
45
+ "vocab_size": 30528
46
+ }
conversion_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_id": "Alibaba-NLP/gte-base-en-v1.5",
3
+ "number_of_generated_embeddings": 768,
4
+ "precision_to_filename_map": {
5
+ "fp32": "onnx/model.onnx",
6
+ "int8": "onnx/model_int8.onnx",
7
+ "uint8": "onnx/model_uint8.onnx"
8
+
9
+ },
10
+ "opset": 16,
11
+ "IR": 8
12
+ }
convert.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import shutil
4
+
5
+ from optimum.exporters.onnx import main_export
6
+ import onnx
7
+ from onnxconverter_common import float16
8
+ import onnxruntime as rt
9
+ from onnxruntime.tools.onnx_model_utils import *
10
+ from onnxruntime.quantization import quantize_dynamic, QuantType
11
+
12
+ with open('conversion_config.json') as json_file:
13
+ conversion_config = json.load(json_file)
14
+
15
+
16
+ model_id = conversion_config["model_id"]
17
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
18
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
19
+ opset = conversion_config["opset"]
20
+ IR = conversion_config["IR"]
21
+
22
+
23
+ op = onnx.OperatorSetIdProto()
24
+ op.version = opset
25
+
26
+
27
+ if not os.path.exists("onnx"):
28
+ os.makedirs("onnx")
29
+
30
+ print("Exporting the main model version")
31
+
32
+ main_export(model_name_or_path=model_id, output="./", opset=opset, trust_remote_code=True, task="feature-extraction", dtype="fp32")
33
+
34
+ if "fp32" in precision_to_filename_map:
35
+ print("Exporting the fp32 onnx file...")
36
+
37
+ shutil.copyfile('model.onnx', precision_to_filename_map["fp32"])
38
+
39
+ print("Done\n\n")
40
+
41
+ if "int8" in precision_to_filename_map:
42
+ print("Quantizing fp32 model to int8...")
43
+ quantize_dynamic("model.onnx", precision_to_filename_map["int8"], weight_type=QuantType.QInt8)
44
+ print("Done\n\n")
45
+
46
+ if "uint8" in precision_to_filename_map:
47
+ print("Quantizing fp32 model to uint8...")
48
+ quantize_dynamic("model.onnx", precision_to_filename_map["uint8"], weight_type=QuantType.QUInt8)
49
+ print("Done\n\n")
50
+
51
+ os.remove("model.onnx")
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40aa468bfe2194dd285fbf4ca50fb6311c4c0e8e7e177554bec81e50355425ef
3
+ size 555983762
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96fa2a0af4ba91b889bea073e4360a83b2e26ae239e80a59ab38f0e920135d20
3
+ size 146153111
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5265123a73ec3fede01fa801ebd7c96177b4264d3ce677d1bc4d6332465b61c1
3
+ size 146153133
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
test_local.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import onnxruntime as rt
2
+
3
+ from sentence_transformers.util import cos_sim
4
+ from sentence_transformers import SentenceTransformer
5
+
6
+ import transformers
7
+
8
+ import gc
9
+ import json
10
+
11
+
12
+ with open('conversion_config.json') as json_file:
13
+ conversion_config = json.load(json_file)
14
+
15
+
16
+ model_id = conversion_config["model_id"]
17
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
18
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
19
+
20
+ sentences_1 = 'How is the weather today?'
21
+ sentences_2 = 'What is the current weather like today?'
22
+
23
+ print(f"Testing on cosine similiarity between sentences: \n'{sentences_1}'\n'{sentences_2}'\n\n\n")
24
+
25
+ tokenizer = transformers.AutoTokenizer.from_pretrained("./")
26
+ enc1 = tokenizer(sentences_1)
27
+ enc2 = tokenizer(sentences_2)
28
+
29
+ for precision, file_name in precision_to_filename_map.items():
30
+
31
+
32
+ onnx_session = rt.InferenceSession(file_name)
33
+ embeddings_1_onnx = onnx_session.run(None, {"input_ids": [enc1.input_ids],
34
+ "attention_mask": [enc1.attention_mask]})[1][0]
35
+
36
+ embeddings_2_onnx = onnx_session.run(None, {"input_ids": [enc2.input_ids],
37
+ "attention_mask": [enc2.attention_mask]})[1][0]
38
+
39
+ del onnx_session
40
+ gc.collect()
41
+ print(f'Cosine similiarity for ONNX model with precision "{precision}" is {str(cos_sim(embeddings_1_onnx, embeddings_2_onnx))}')
42
+
43
+
44
+
45
+
46
+ model = SentenceTransformer(model_id, trust_remote_code=True)
47
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
48
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
49
+ print('Cosine similiarity for original sentence transformer model is '+str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
test_teradata.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import teradataml as tdml
3
+ from tabulate import tabulate
4
+
5
+ import json
6
+
7
+
8
+ with open('conversion_config.json') as json_file:
9
+ conversion_config = json.load(json_file)
10
+
11
+
12
+ model_id = conversion_config["model_id"]
13
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
14
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
15
+
16
+ host = sys.argv[1]
17
+ username = sys.argv[2]
18
+ password = sys.argv[3]
19
+
20
+ print("Setting up connection to teradata...")
21
+ tdml.create_context(host = host, username = username, password = password)
22
+ print("Done\n\n")
23
+
24
+
25
+ print("Deploying tokenizer...")
26
+ try:
27
+ tdml.db_drop_table('tokenizer_table')
28
+ except:
29
+ print("Can't drop tokenizers table - it's not existing")
30
+ tdml.save_byom('tokenizer',
31
+ 'tokenizer.json',
32
+ 'tokenizer_table')
33
+ print("Done\n\n")
34
+
35
+ print("Testing models...")
36
+ try:
37
+ tdml.db_drop_table('model_table')
38
+ except:
39
+ print("Can't drop models table - it's not existing")
40
+
41
+ for precision, file_name in precision_to_filename_map.items():
42
+ print(f"Deploying {precision} model...")
43
+ tdml.save_byom(precision,
44
+ file_name,
45
+ 'model_table')
46
+ print(f"Model {precision} is deployed\n")
47
+
48
+ print(f"Calculating embeddings with {precision} model...")
49
+ try:
50
+ tdml.db_drop_table('emails_embeddings_store')
51
+ except:
52
+ print("Can't drop embeddings table - it's not existing")
53
+
54
+ tdml.execute_sql(f"""
55
+ create volatile table emails_embeddings_store as (
56
+ select
57
+ *
58
+ from mldb.ONNXEmbeddings(
59
+ on emails.emails as InputTable
60
+ on (select * from model_table where model_id = '{precision}') as ModelTable DIMENSION
61
+ on (select model as tokenizer from tokenizer_table where model_id = 'tokenizer') as TokenizerTable DIMENSION
62
+
63
+ using
64
+ Accumulate('id', 'txt')
65
+ ModelOutputTensor('sentence_embedding')
66
+ EnableMemoryCheck('false')
67
+ OutputFormat('FLOAT32({number_of_generated_embeddings})')
68
+ OverwriteCachedModel('true')
69
+ ) a
70
+ ) with data on commit preserve rows
71
+
72
+ """)
73
+ print("Embeddings calculated")
74
+ print(f"Testing semantic search with cosine similiarity on the output of the model with precision '{precision}'...")
75
+ tdf_embeddings_store = tdml.DataFrame('emails_embeddings_store')
76
+ tdf_embeddings_store_tgt = tdf_embeddings_store[tdf_embeddings_store.id == 3]
77
+
78
+ tdf_embeddings_store_ref = tdf_embeddings_store[tdf_embeddings_store.id != 3]
79
+
80
+ cos_sim_pd = tdml.DataFrame.from_query(f"""
81
+ SELECT
82
+ dt.target_id,
83
+ dt.reference_id,
84
+ e_tgt.txt as target_txt,
85
+ e_ref.txt as reference_txt,
86
+ (1.0 - dt.distance) as similiarity
87
+ FROM
88
+ TD_VECTORDISTANCE (
89
+ ON ({tdf_embeddings_store_tgt.show_query()}) AS TargetTable
90
+ ON ({tdf_embeddings_store_ref.show_query()}) AS ReferenceTable DIMENSION
91
+ USING
92
+ TargetIDColumn('id')
93
+ TargetFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
94
+ RefIDColumn('id')
95
+ RefFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
96
+ DistanceMeasure('cosine')
97
+ topk(3)
98
+ ) AS dt
99
+ JOIN emails.emails e_tgt on e_tgt.id = dt.target_id
100
+ JOIN emails.emails e_ref on e_ref.id = dt.reference_id;
101
+ """).to_pandas()
102
+ print(tabulate(cos_sim_pd, headers='keys', tablefmt='fancy_grid'))
103
+ print("Done\n\n")
104
+
105
+
106
+ tdml.remove_context()
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 32768,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff