arthurbresnu HF Staff commited on
Commit
2e04eb9
·
verified ·
1 Parent(s): 3a0d247

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,1751 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - generated_from_trainer
11
+ - dataset_size:90000
12
+ - loss:SpladeLoss
13
+ - loss:SparseMultipleNegativesRankingLoss
14
+ - loss:FlopsLoss
15
+ base_model: distilbert/distilbert-base-uncased
16
+ widget:
17
+ - text: what is chess
18
+ - text: what is a hickman for?
19
+ - text: 'Steps. 1 1. Gather your materials. Here''s what you need to build two regulations-size
20
+ horseshoe pits that will face each other (if you only want to build one pit, halve
21
+ the materials): Two 6-foot-long treated wood 2x6s (38mm x 140mm), cut in half.
22
+ 2 2. Decide where you''re going to put your pit(s).'
23
+ - text: who played at california jam
24
+ - text: "To the Citizens of St. Bernard We chose as our motto a simple but profound\
25
+ \ declaration: â\x80\x9CWelcome to your office.â\x80\x9D Those words remind us\
26
+ \ that we are no more than the caretakers of the office of Clerk of Court for\
27
+ \ the Parish of St. Bernard."
28
+ datasets:
29
+ - sentence-transformers/msmarco
30
+ pipeline_tag: feature-extraction
31
+ library_name: sentence-transformers
32
+ metrics:
33
+ - dot_accuracy@1
34
+ - dot_accuracy@3
35
+ - dot_accuracy@5
36
+ - dot_accuracy@10
37
+ - dot_precision@1
38
+ - dot_precision@3
39
+ - dot_precision@5
40
+ - dot_precision@10
41
+ - dot_recall@1
42
+ - dot_recall@3
43
+ - dot_recall@5
44
+ - dot_recall@10
45
+ - dot_ndcg@10
46
+ - dot_mrr@10
47
+ - dot_map@100
48
+ - query_active_dims
49
+ - query_sparsity_ratio
50
+ - corpus_active_dims
51
+ - corpus_sparsity_ratio
52
+ co2_eq_emissions:
53
+ emissions: 20.864216098626564
54
+ energy_consumed: 0.05652200756224921
55
+ source: codecarbon
56
+ training_type: fine-tuning
57
+ on_cloud: false
58
+ cpu_model: AMD EPYC 7R13 Processor
59
+ ram_total_size: 248.0
60
+ hours_used: 0.179
61
+ hardware_used: 1 x NVIDIA H100 80GB HBM3
62
+ model-index:
63
+ - name: splade-distilbert-base-uncased trained on MS MARCO triplets
64
+ results:
65
+ - task:
66
+ type: sparse-information-retrieval
67
+ name: Sparse Information Retrieval
68
+ dataset:
69
+ name: NanoMSMARCO
70
+ type: NanoMSMARCO
71
+ metrics:
72
+ - type: dot_accuracy@1
73
+ value: 0.44
74
+ name: Dot Accuracy@1
75
+ - type: dot_accuracy@3
76
+ value: 0.66
77
+ name: Dot Accuracy@3
78
+ - type: dot_accuracy@5
79
+ value: 0.72
80
+ name: Dot Accuracy@5
81
+ - type: dot_accuracy@10
82
+ value: 0.82
83
+ name: Dot Accuracy@10
84
+ - type: dot_precision@1
85
+ value: 0.44
86
+ name: Dot Precision@1
87
+ - type: dot_precision@3
88
+ value: 0.22
89
+ name: Dot Precision@3
90
+ - type: dot_precision@5
91
+ value: 0.14400000000000002
92
+ name: Dot Precision@5
93
+ - type: dot_precision@10
94
+ value: 0.08199999999999999
95
+ name: Dot Precision@10
96
+ - type: dot_recall@1
97
+ value: 0.44
98
+ name: Dot Recall@1
99
+ - type: dot_recall@3
100
+ value: 0.66
101
+ name: Dot Recall@3
102
+ - type: dot_recall@5
103
+ value: 0.72
104
+ name: Dot Recall@5
105
+ - type: dot_recall@10
106
+ value: 0.82
107
+ name: Dot Recall@10
108
+ - type: dot_ndcg@10
109
+ value: 0.6223979987260191
110
+ name: Dot Ndcg@10
111
+ - type: dot_mrr@10
112
+ value: 0.5599444444444444
113
+ name: Dot Mrr@10
114
+ - type: dot_map@100
115
+ value: 0.5701364200315813
116
+ name: Dot Map@100
117
+ - type: query_active_dims
118
+ value: 25.260000228881836
119
+ name: Query Active Dims
120
+ - type: query_sparsity_ratio
121
+ value: 0.9991724002283965
122
+ name: Query Sparsity Ratio
123
+ - type: corpus_active_dims
124
+ value: 89.06385040283203
125
+ name: Corpus Active Dims
126
+ - type: corpus_sparsity_ratio
127
+ value: 0.9970819785596348
128
+ name: Corpus Sparsity Ratio
129
+ - type: dot_accuracy@1
130
+ value: 0.44
131
+ name: Dot Accuracy@1
132
+ - type: dot_accuracy@3
133
+ value: 0.6
134
+ name: Dot Accuracy@3
135
+ - type: dot_accuracy@5
136
+ value: 0.74
137
+ name: Dot Accuracy@5
138
+ - type: dot_accuracy@10
139
+ value: 0.84
140
+ name: Dot Accuracy@10
141
+ - type: dot_precision@1
142
+ value: 0.44
143
+ name: Dot Precision@1
144
+ - type: dot_precision@3
145
+ value: 0.2
146
+ name: Dot Precision@3
147
+ - type: dot_precision@5
148
+ value: 0.14800000000000002
149
+ name: Dot Precision@5
150
+ - type: dot_precision@10
151
+ value: 0.08399999999999999
152
+ name: Dot Precision@10
153
+ - type: dot_recall@1
154
+ value: 0.44
155
+ name: Dot Recall@1
156
+ - type: dot_recall@3
157
+ value: 0.6
158
+ name: Dot Recall@3
159
+ - type: dot_recall@5
160
+ value: 0.74
161
+ name: Dot Recall@5
162
+ - type: dot_recall@10
163
+ value: 0.84
164
+ name: Dot Recall@10
165
+ - type: dot_ndcg@10
166
+ value: 0.6241753240638171
167
+ name: Dot Ndcg@10
168
+ - type: dot_mrr@10
169
+ value: 0.5571349206349206
170
+ name: Dot Mrr@10
171
+ - type: dot_map@100
172
+ value: 0.5639260419913368
173
+ name: Dot Map@100
174
+ - type: query_active_dims
175
+ value: 20.5
176
+ name: Query Active Dims
177
+ - type: query_sparsity_ratio
178
+ value: 0.9993283533189175
179
+ name: Query Sparsity Ratio
180
+ - type: corpus_active_dims
181
+ value: 81.87666320800781
182
+ name: Corpus Active Dims
183
+ - type: corpus_sparsity_ratio
184
+ value: 0.9973174541901578
185
+ name: Corpus Sparsity Ratio
186
+ - task:
187
+ type: sparse-information-retrieval
188
+ name: Sparse Information Retrieval
189
+ dataset:
190
+ name: NanoNFCorpus
191
+ type: NanoNFCorpus
192
+ metrics:
193
+ - type: dot_accuracy@1
194
+ value: 0.4
195
+ name: Dot Accuracy@1
196
+ - type: dot_accuracy@3
197
+ value: 0.52
198
+ name: Dot Accuracy@3
199
+ - type: dot_accuracy@5
200
+ value: 0.54
201
+ name: Dot Accuracy@5
202
+ - type: dot_accuracy@10
203
+ value: 0.66
204
+ name: Dot Accuracy@10
205
+ - type: dot_precision@1
206
+ value: 0.4
207
+ name: Dot Precision@1
208
+ - type: dot_precision@3
209
+ value: 0.3666666666666667
210
+ name: Dot Precision@3
211
+ - type: dot_precision@5
212
+ value: 0.332
213
+ name: Dot Precision@5
214
+ - type: dot_precision@10
215
+ value: 0.27
216
+ name: Dot Precision@10
217
+ - type: dot_recall@1
218
+ value: 0.023282599806398227
219
+ name: Dot Recall@1
220
+ - type: dot_recall@3
221
+ value: 0.07519782108259539
222
+ name: Dot Recall@3
223
+ - type: dot_recall@5
224
+ value: 0.09254782270412643
225
+ name: Dot Recall@5
226
+ - type: dot_recall@10
227
+ value: 0.12120665375595915
228
+ name: Dot Recall@10
229
+ - type: dot_ndcg@10
230
+ value: 0.32050254842735026
231
+ name: Dot Ndcg@10
232
+ - type: dot_mrr@10
233
+ value: 0.4703888888888889
234
+ name: Dot Mrr@10
235
+ - type: dot_map@100
236
+ value: 0.13331879084552362
237
+ name: Dot Map@100
238
+ - type: query_active_dims
239
+ value: 17.639999389648438
240
+ name: Query Active Dims
241
+ - type: query_sparsity_ratio
242
+ value: 0.9994220562417387
243
+ name: Query Sparsity Ratio
244
+ - type: corpus_active_dims
245
+ value: 165.31358337402344
246
+ name: Corpus Active Dims
247
+ - type: corpus_sparsity_ratio
248
+ value: 0.9945837892872674
249
+ name: Corpus Sparsity Ratio
250
+ - type: dot_accuracy@1
251
+ value: 0.36
252
+ name: Dot Accuracy@1
253
+ - type: dot_accuracy@3
254
+ value: 0.46
255
+ name: Dot Accuracy@3
256
+ - type: dot_accuracy@5
257
+ value: 0.54
258
+ name: Dot Accuracy@5
259
+ - type: dot_accuracy@10
260
+ value: 0.68
261
+ name: Dot Accuracy@10
262
+ - type: dot_precision@1
263
+ value: 0.36
264
+ name: Dot Precision@1
265
+ - type: dot_precision@3
266
+ value: 0.34
267
+ name: Dot Precision@3
268
+ - type: dot_precision@5
269
+ value: 0.32799999999999996
270
+ name: Dot Precision@5
271
+ - type: dot_precision@10
272
+ value: 0.27
273
+ name: Dot Precision@10
274
+ - type: dot_recall@1
275
+ value: 0.02081925669789383
276
+ name: Dot Recall@1
277
+ - type: dot_recall@3
278
+ value: 0.07064967781220355
279
+ name: Dot Recall@3
280
+ - type: dot_recall@5
281
+ value: 0.09055307754310991
282
+ name: Dot Recall@5
283
+ - type: dot_recall@10
284
+ value: 0.14403725441385476
285
+ name: Dot Recall@10
286
+ - type: dot_ndcg@10
287
+ value: 0.3196380424829849
288
+ name: Dot Ndcg@10
289
+ - type: dot_mrr@10
290
+ value: 0.4414444444444445
291
+ name: Dot Mrr@10
292
+ - type: dot_map@100
293
+ value: 0.13569627052041464
294
+ name: Dot Map@100
295
+ - type: query_active_dims
296
+ value: 18.299999237060547
297
+ name: Query Active Dims
298
+ - type: query_sparsity_ratio
299
+ value: 0.9994004324999325
300
+ name: Query Sparsity Ratio
301
+ - type: corpus_active_dims
302
+ value: 156.04843139648438
303
+ name: Corpus Active Dims
304
+ - type: corpus_sparsity_ratio
305
+ value: 0.9948873458031424
306
+ name: Corpus Sparsity Ratio
307
+ - task:
308
+ type: sparse-information-retrieval
309
+ name: Sparse Information Retrieval
310
+ dataset:
311
+ name: NanoNQ
312
+ type: NanoNQ
313
+ metrics:
314
+ - type: dot_accuracy@1
315
+ value: 0.48
316
+ name: Dot Accuracy@1
317
+ - type: dot_accuracy@3
318
+ value: 0.68
319
+ name: Dot Accuracy@3
320
+ - type: dot_accuracy@5
321
+ value: 0.72
322
+ name: Dot Accuracy@5
323
+ - type: dot_accuracy@10
324
+ value: 0.76
325
+ name: Dot Accuracy@10
326
+ - type: dot_precision@1
327
+ value: 0.48
328
+ name: Dot Precision@1
329
+ - type: dot_precision@3
330
+ value: 0.22666666666666668
331
+ name: Dot Precision@3
332
+ - type: dot_precision@5
333
+ value: 0.14400000000000002
334
+ name: Dot Precision@5
335
+ - type: dot_precision@10
336
+ value: 0.08199999999999999
337
+ name: Dot Precision@10
338
+ - type: dot_recall@1
339
+ value: 0.46
340
+ name: Dot Recall@1
341
+ - type: dot_recall@3
342
+ value: 0.65
343
+ name: Dot Recall@3
344
+ - type: dot_recall@5
345
+ value: 0.68
346
+ name: Dot Recall@5
347
+ - type: dot_recall@10
348
+ value: 0.74
349
+ name: Dot Recall@10
350
+ - type: dot_ndcg@10
351
+ value: 0.6136977374010735
352
+ name: Dot Ndcg@10
353
+ - type: dot_mrr@10
354
+ value: 0.585079365079365
355
+ name: Dot Mrr@10
356
+ - type: dot_map@100
357
+ value: 0.5730967720685111
358
+ name: Dot Map@100
359
+ - type: query_active_dims
360
+ value: 24.299999237060547
361
+ name: Query Active Dims
362
+ - type: query_sparsity_ratio
363
+ value: 0.9992038529835181
364
+ name: Query Sparsity Ratio
365
+ - type: corpus_active_dims
366
+ value: 103.79106140136719
367
+ name: Corpus Active Dims
368
+ - type: corpus_sparsity_ratio
369
+ value: 0.9965994672235972
370
+ name: Corpus Sparsity Ratio
371
+ - type: dot_accuracy@1
372
+ value: 0.48
373
+ name: Dot Accuracy@1
374
+ - type: dot_accuracy@3
375
+ value: 0.68
376
+ name: Dot Accuracy@3
377
+ - type: dot_accuracy@5
378
+ value: 0.74
379
+ name: Dot Accuracy@5
380
+ - type: dot_accuracy@10
381
+ value: 0.76
382
+ name: Dot Accuracy@10
383
+ - type: dot_precision@1
384
+ value: 0.48
385
+ name: Dot Precision@1
386
+ - type: dot_precision@3
387
+ value: 0.22666666666666668
388
+ name: Dot Precision@3
389
+ - type: dot_precision@5
390
+ value: 0.15200000000000002
391
+ name: Dot Precision@5
392
+ - type: dot_precision@10
393
+ value: 0.08
394
+ name: Dot Precision@10
395
+ - type: dot_recall@1
396
+ value: 0.47
397
+ name: Dot Recall@1
398
+ - type: dot_recall@3
399
+ value: 0.64
400
+ name: Dot Recall@3
401
+ - type: dot_recall@5
402
+ value: 0.7
403
+ name: Dot Recall@5
404
+ - type: dot_recall@10
405
+ value: 0.73
406
+ name: Dot Recall@10
407
+ - type: dot_ndcg@10
408
+ value: 0.6150809765850531
409
+ name: Dot Ndcg@10
410
+ - type: dot_mrr@10
411
+ value: 0.5864999999999999
412
+ name: Dot Mrr@10
413
+ - type: dot_map@100
414
+ value: 0.5841443871983568
415
+ name: Dot Map@100
416
+ - type: query_active_dims
417
+ value: 22.200000762939453
418
+ name: Query Active Dims
419
+ - type: query_sparsity_ratio
420
+ value: 0.9992726557642704
421
+ name: Query Sparsity Ratio
422
+ - type: corpus_active_dims
423
+ value: 103.72532653808594
424
+ name: Corpus Active Dims
425
+ - type: corpus_sparsity_ratio
426
+ value: 0.9966016209115365
427
+ name: Corpus Sparsity Ratio
428
+ - task:
429
+ type: sparse-nano-beir
430
+ name: Sparse Nano BEIR
431
+ dataset:
432
+ name: NanoBEIR mean
433
+ type: NanoBEIR_mean
434
+ metrics:
435
+ - type: dot_accuracy@1
436
+ value: 0.44
437
+ name: Dot Accuracy@1
438
+ - type: dot_accuracy@3
439
+ value: 0.6200000000000001
440
+ name: Dot Accuracy@3
441
+ - type: dot_accuracy@5
442
+ value: 0.66
443
+ name: Dot Accuracy@5
444
+ - type: dot_accuracy@10
445
+ value: 0.7466666666666667
446
+ name: Dot Accuracy@10
447
+ - type: dot_precision@1
448
+ value: 0.44
449
+ name: Dot Precision@1
450
+ - type: dot_precision@3
451
+ value: 0.27111111111111114
452
+ name: Dot Precision@3
453
+ - type: dot_precision@5
454
+ value: 0.2066666666666667
455
+ name: Dot Precision@5
456
+ - type: dot_precision@10
457
+ value: 0.14466666666666664
458
+ name: Dot Precision@10
459
+ - type: dot_recall@1
460
+ value: 0.30776086660213275
461
+ name: Dot Recall@1
462
+ - type: dot_recall@3
463
+ value: 0.4617326070275318
464
+ name: Dot Recall@3
465
+ - type: dot_recall@5
466
+ value: 0.49751594090137546
467
+ name: Dot Recall@5
468
+ - type: dot_recall@10
469
+ value: 0.5604022179186531
470
+ name: Dot Recall@10
471
+ - type: dot_ndcg@10
472
+ value: 0.5188660948514809
473
+ name: Dot Ndcg@10
474
+ - type: dot_mrr@10
475
+ value: 0.5384708994708994
476
+ name: Dot Mrr@10
477
+ - type: dot_map@100
478
+ value: 0.42551732764853867
479
+ name: Dot Map@100
480
+ - type: query_active_dims
481
+ value: 22.399999618530273
482
+ name: Query Active Dims
483
+ - type: query_sparsity_ratio
484
+ value: 0.9992661031512178
485
+ name: Query Sparsity Ratio
486
+ - type: corpus_active_dims
487
+ value: 112.03345893951784
488
+ name: Corpus Active Dims
489
+ - type: corpus_sparsity_ratio
490
+ value: 0.9963294194699063
491
+ name: Corpus Sparsity Ratio
492
+ - type: dot_accuracy@1
493
+ value: 0.5241130298273154
494
+ name: Dot Accuracy@1
495
+ - type: dot_accuracy@3
496
+ value: 0.6799372056514913
497
+ name: Dot Accuracy@3
498
+ - type: dot_accuracy@5
499
+ value: 0.7415070643642072
500
+ name: Dot Accuracy@5
501
+ - type: dot_accuracy@10
502
+ value: 0.8169230769230769
503
+ name: Dot Accuracy@10
504
+ - type: dot_precision@1
505
+ value: 0.5241130298273154
506
+ name: Dot Precision@1
507
+ - type: dot_precision@3
508
+ value: 0.3215384615384615
509
+ name: Dot Precision@3
510
+ - type: dot_precision@5
511
+ value: 0.2547566718995291
512
+ name: Dot Precision@5
513
+ - type: dot_precision@10
514
+ value: 0.17874411302982732
515
+ name: Dot Precision@10
516
+ - type: dot_recall@1
517
+ value: 0.30856930592565196
518
+ name: Dot Recall@1
519
+ - type: dot_recall@3
520
+ value: 0.4441119539769697
521
+ name: Dot Recall@3
522
+ - type: dot_recall@5
523
+ value: 0.5092929381431597
524
+ name: Dot Recall@5
525
+ - type: dot_recall@10
526
+ value: 0.5878231569460904
527
+ name: Dot Recall@10
528
+ - type: dot_ndcg@10
529
+ value: 0.5577320367017354
530
+ name: Dot Ndcg@10
531
+ - type: dot_mrr@10
532
+ value: 0.6173593605940545
533
+ name: Dot Mrr@10
534
+ - type: dot_map@100
535
+ value: 0.48084758588880655
536
+ name: Dot Map@100
537
+ - type: query_active_dims
538
+ value: 38.07395960627058
539
+ name: Query Active Dims
540
+ - type: query_sparsity_ratio
541
+ value: 0.9987525732387698
542
+ name: Query Sparsity Ratio
543
+ - type: corpus_active_dims
544
+ value: 105.05153383516846
545
+ name: Corpus Active Dims
546
+ - type: corpus_sparsity_ratio
547
+ value: 0.9965581700466821
548
+ name: Corpus Sparsity Ratio
549
+ - task:
550
+ type: sparse-information-retrieval
551
+ name: Sparse Information Retrieval
552
+ dataset:
553
+ name: NanoClimateFEVER
554
+ type: NanoClimateFEVER
555
+ metrics:
556
+ - type: dot_accuracy@1
557
+ value: 0.24
558
+ name: Dot Accuracy@1
559
+ - type: dot_accuracy@3
560
+ value: 0.42
561
+ name: Dot Accuracy@3
562
+ - type: dot_accuracy@5
563
+ value: 0.56
564
+ name: Dot Accuracy@5
565
+ - type: dot_accuracy@10
566
+ value: 0.64
567
+ name: Dot Accuracy@10
568
+ - type: dot_precision@1
569
+ value: 0.24
570
+ name: Dot Precision@1
571
+ - type: dot_precision@3
572
+ value: 0.14666666666666664
573
+ name: Dot Precision@3
574
+ - type: dot_precision@5
575
+ value: 0.12
576
+ name: Dot Precision@5
577
+ - type: dot_precision@10
578
+ value: 0.07400000000000001
579
+ name: Dot Precision@10
580
+ - type: dot_recall@1
581
+ value: 0.11833333333333332
582
+ name: Dot Recall@1
583
+ - type: dot_recall@3
584
+ value: 0.21166666666666664
585
+ name: Dot Recall@3
586
+ - type: dot_recall@5
587
+ value: 0.26233333333333336
588
+ name: Dot Recall@5
589
+ - type: dot_recall@10
590
+ value: 0.29966666666666664
591
+ name: Dot Recall@10
592
+ - type: dot_ndcg@10
593
+ value: 0.25712162589613363
594
+ name: Dot Ndcg@10
595
+ - type: dot_mrr@10
596
+ value: 0.35861111111111116
597
+ name: Dot Mrr@10
598
+ - type: dot_map@100
599
+ value: 0.20460406106488077
600
+ name: Dot Map@100
601
+ - type: query_active_dims
602
+ value: 51.47999954223633
603
+ name: Query Active Dims
604
+ - type: query_sparsity_ratio
605
+ value: 0.9983133477641624
606
+ name: Query Sparsity Ratio
607
+ - type: corpus_active_dims
608
+ value: 134.2989959716797
609
+ name: Corpus Active Dims
610
+ - type: corpus_sparsity_ratio
611
+ value: 0.9955999280528248
612
+ name: Corpus Sparsity Ratio
613
+ - task:
614
+ type: sparse-information-retrieval
615
+ name: Sparse Information Retrieval
616
+ dataset:
617
+ name: NanoDBPedia
618
+ type: NanoDBPedia
619
+ metrics:
620
+ - type: dot_accuracy@1
621
+ value: 0.7
622
+ name: Dot Accuracy@1
623
+ - type: dot_accuracy@3
624
+ value: 0.82
625
+ name: Dot Accuracy@3
626
+ - type: dot_accuracy@5
627
+ value: 0.88
628
+ name: Dot Accuracy@5
629
+ - type: dot_accuracy@10
630
+ value: 0.92
631
+ name: Dot Accuracy@10
632
+ - type: dot_precision@1
633
+ value: 0.7
634
+ name: Dot Precision@1
635
+ - type: dot_precision@3
636
+ value: 0.6133333333333333
637
+ name: Dot Precision@3
638
+ - type: dot_precision@5
639
+ value: 0.58
640
+ name: Dot Precision@5
641
+ - type: dot_precision@10
642
+ value: 0.52
643
+ name: Dot Precision@10
644
+ - type: dot_recall@1
645
+ value: 0.05306233623739282
646
+ name: Dot Recall@1
647
+ - type: dot_recall@3
648
+ value: 0.16391544714816778
649
+ name: Dot Recall@3
650
+ - type: dot_recall@5
651
+ value: 0.23662708539883293
652
+ name: Dot Recall@5
653
+ - type: dot_recall@10
654
+ value: 0.3543605851621492
655
+ name: Dot Recall@10
656
+ - type: dot_ndcg@10
657
+ value: 0.6137764330075132
658
+ name: Dot Ndcg@10
659
+ - type: dot_mrr@10
660
+ value: 0.771888888888889
661
+ name: Dot Mrr@10
662
+ - type: dot_map@100
663
+ value: 0.4604772150699302
664
+ name: Dot Map@100
665
+ - type: query_active_dims
666
+ value: 20.520000457763672
667
+ name: Query Active Dims
668
+ - type: query_sparsity_ratio
669
+ value: 0.999327698038865
670
+ name: Query Sparsity Ratio
671
+ - type: corpus_active_dims
672
+ value: 111.07841491699219
673
+ name: Corpus Active Dims
674
+ - type: corpus_sparsity_ratio
675
+ value: 0.9963607098185902
676
+ name: Corpus Sparsity Ratio
677
+ - task:
678
+ type: sparse-information-retrieval
679
+ name: Sparse Information Retrieval
680
+ dataset:
681
+ name: NanoFEVER
682
+ type: NanoFEVER
683
+ metrics:
684
+ - type: dot_accuracy@1
685
+ value: 0.74
686
+ name: Dot Accuracy@1
687
+ - type: dot_accuracy@3
688
+ value: 0.9
689
+ name: Dot Accuracy@3
690
+ - type: dot_accuracy@5
691
+ value: 0.92
692
+ name: Dot Accuracy@5
693
+ - type: dot_accuracy@10
694
+ value: 0.98
695
+ name: Dot Accuracy@10
696
+ - type: dot_precision@1
697
+ value: 0.74
698
+ name: Dot Precision@1
699
+ - type: dot_precision@3
700
+ value: 0.3133333333333333
701
+ name: Dot Precision@3
702
+ - type: dot_precision@5
703
+ value: 0.19599999999999995
704
+ name: Dot Precision@5
705
+ - type: dot_precision@10
706
+ value: 0.10399999999999998
707
+ name: Dot Precision@10
708
+ - type: dot_recall@1
709
+ value: 0.7066666666666667
710
+ name: Dot Recall@1
711
+ - type: dot_recall@3
712
+ value: 0.8666666666666667
713
+ name: Dot Recall@3
714
+ - type: dot_recall@5
715
+ value: 0.8933333333333333
716
+ name: Dot Recall@5
717
+ - type: dot_recall@10
718
+ value: 0.9433333333333332
719
+ name: Dot Recall@10
720
+ - type: dot_ndcg@10
721
+ value: 0.8368149756149829
722
+ name: Dot Ndcg@10
723
+ - type: dot_mrr@10
724
+ value: 0.8170000000000001
725
+ name: Dot Mrr@10
726
+ - type: dot_map@100
727
+ value: 0.7993556466302367
728
+ name: Dot Map@100
729
+ - type: query_active_dims
730
+ value: 44.84000015258789
731
+ name: Query Active Dims
732
+ - type: query_sparsity_ratio
733
+ value: 0.9985308957423306
734
+ name: Query Sparsity Ratio
735
+ - type: corpus_active_dims
736
+ value: 154.09767150878906
737
+ name: Corpus Active Dims
738
+ - type: corpus_sparsity_ratio
739
+ value: 0.9949512590423697
740
+ name: Corpus Sparsity Ratio
741
+ - task:
742
+ type: sparse-information-retrieval
743
+ name: Sparse Information Retrieval
744
+ dataset:
745
+ name: NanoFiQA2018
746
+ type: NanoFiQA2018
747
+ metrics:
748
+ - type: dot_accuracy@1
749
+ value: 0.34
750
+ name: Dot Accuracy@1
751
+ - type: dot_accuracy@3
752
+ value: 0.5
753
+ name: Dot Accuracy@3
754
+ - type: dot_accuracy@5
755
+ value: 0.58
756
+ name: Dot Accuracy@5
757
+ - type: dot_accuracy@10
758
+ value: 0.68
759
+ name: Dot Accuracy@10
760
+ - type: dot_precision@1
761
+ value: 0.34
762
+ name: Dot Precision@1
763
+ - type: dot_precision@3
764
+ value: 0.21333333333333332
765
+ name: Dot Precision@3
766
+ - type: dot_precision@5
767
+ value: 0.17600000000000002
768
+ name: Dot Precision@5
769
+ - type: dot_precision@10
770
+ value: 0.11199999999999999
771
+ name: Dot Precision@10
772
+ - type: dot_recall@1
773
+ value: 0.1770793650793651
774
+ name: Dot Recall@1
775
+ - type: dot_recall@3
776
+ value: 0.3069920634920635
777
+ name: Dot Recall@3
778
+ - type: dot_recall@5
779
+ value: 0.3936825396825397
780
+ name: Dot Recall@5
781
+ - type: dot_recall@10
782
+ value: 0.48673809523809525
783
+ name: Dot Recall@10
784
+ - type: dot_ndcg@10
785
+ value: 0.3901649596140352
786
+ name: Dot Ndcg@10
787
+ - type: dot_mrr@10
788
+ value: 0.4438809523809523
789
+ name: Dot Mrr@10
790
+ - type: dot_map@100
791
+ value: 0.32670074884185174
792
+ name: Dot Map@100
793
+ - type: query_active_dims
794
+ value: 18.920000076293945
795
+ name: Query Active Dims
796
+ - type: query_sparsity_ratio
797
+ value: 0.9993801192557403
798
+ name: Query Sparsity Ratio
799
+ - type: corpus_active_dims
800
+ value: 75.49989318847656
801
+ name: Corpus Active Dims
802
+ - type: corpus_sparsity_ratio
803
+ value: 0.9975263779179453
804
+ name: Corpus Sparsity Ratio
805
+ - task:
806
+ type: sparse-information-retrieval
807
+ name: Sparse Information Retrieval
808
+ dataset:
809
+ name: NanoHotpotQA
810
+ type: NanoHotpotQA
811
+ metrics:
812
+ - type: dot_accuracy@1
813
+ value: 0.88
814
+ name: Dot Accuracy@1
815
+ - type: dot_accuracy@3
816
+ value: 0.92
817
+ name: Dot Accuracy@3
818
+ - type: dot_accuracy@5
819
+ value: 0.94
820
+ name: Dot Accuracy@5
821
+ - type: dot_accuracy@10
822
+ value: 0.96
823
+ name: Dot Accuracy@10
824
+ - type: dot_precision@1
825
+ value: 0.88
826
+ name: Dot Precision@1
827
+ - type: dot_precision@3
828
+ value: 0.4866666666666666
829
+ name: Dot Precision@3
830
+ - type: dot_precision@5
831
+ value: 0.324
832
+ name: Dot Precision@5
833
+ - type: dot_precision@10
834
+ value: 0.16999999999999996
835
+ name: Dot Precision@10
836
+ - type: dot_recall@1
837
+ value: 0.44
838
+ name: Dot Recall@1
839
+ - type: dot_recall@3
840
+ value: 0.73
841
+ name: Dot Recall@3
842
+ - type: dot_recall@5
843
+ value: 0.81
844
+ name: Dot Recall@5
845
+ - type: dot_recall@10
846
+ value: 0.85
847
+ name: Dot Recall@10
848
+ - type: dot_ndcg@10
849
+ value: 0.8077539978128343
850
+ name: Dot Ndcg@10
851
+ - type: dot_mrr@10
852
+ value: 0.9041666666666667
853
+ name: Dot Mrr@10
854
+ - type: dot_map@100
855
+ value: 0.74474463747389
856
+ name: Dot Map@100
857
+ - type: query_active_dims
858
+ value: 43.880001068115234
859
+ name: Query Active Dims
860
+ - type: query_sparsity_ratio
861
+ value: 0.9985623484349612
862
+ name: Query Sparsity Ratio
863
+ - type: corpus_active_dims
864
+ value: 120.78840637207031
865
+ name: Corpus Active Dims
866
+ - type: corpus_sparsity_ratio
867
+ value: 0.9960425789144856
868
+ name: Corpus Sparsity Ratio
869
+ - task:
870
+ type: sparse-information-retrieval
871
+ name: Sparse Information Retrieval
872
+ dataset:
873
+ name: NanoQuoraRetrieval
874
+ type: NanoQuoraRetrieval
875
+ metrics:
876
+ - type: dot_accuracy@1
877
+ value: 0.84
878
+ name: Dot Accuracy@1
879
+ - type: dot_accuracy@3
880
+ value: 0.92
881
+ name: Dot Accuracy@3
882
+ - type: dot_accuracy@5
883
+ value: 0.94
884
+ name: Dot Accuracy@5
885
+ - type: dot_accuracy@10
886
+ value: 0.96
887
+ name: Dot Accuracy@10
888
+ - type: dot_precision@1
889
+ value: 0.84
890
+ name: Dot Precision@1
891
+ - type: dot_precision@3
892
+ value: 0.32666666666666666
893
+ name: Dot Precision@3
894
+ - type: dot_precision@5
895
+ value: 0.22
896
+ name: Dot Precision@5
897
+ - type: dot_precision@10
898
+ value: 0.11999999999999998
899
+ name: Dot Precision@10
900
+ - type: dot_recall@1
901
+ value: 0.7873333333333333
902
+ name: Dot Recall@1
903
+ - type: dot_recall@3
904
+ value: 0.8540000000000001
905
+ name: Dot Recall@3
906
+ - type: dot_recall@5
907
+ value: 0.898
908
+ name: Dot Recall@5
909
+ - type: dot_recall@10
910
+ value: 0.9313333333333332
911
+ name: Dot Recall@10
912
+ - type: dot_ndcg@10
913
+ value: 0.8841170132005264
914
+ name: Dot Ndcg@10
915
+ - type: dot_mrr@10
916
+ value: 0.8805555555555554
917
+ name: Dot Mrr@10
918
+ - type: dot_map@100
919
+ value: 0.8625873756339163
920
+ name: Dot Map@100
921
+ - type: query_active_dims
922
+ value: 18.760000228881836
923
+ name: Query Active Dims
924
+ - type: query_sparsity_ratio
925
+ value: 0.9993853613711787
926
+ name: Query Sparsity Ratio
927
+ - type: corpus_active_dims
928
+ value: 20.381887435913086
929
+ name: Corpus Active Dims
930
+ - type: corpus_sparsity_ratio
931
+ value: 0.9993322230707059
932
+ name: Corpus Sparsity Ratio
933
+ - task:
934
+ type: sparse-information-retrieval
935
+ name: Sparse Information Retrieval
936
+ dataset:
937
+ name: NanoSCIDOCS
938
+ type: NanoSCIDOCS
939
+ metrics:
940
+ - type: dot_accuracy@1
941
+ value: 0.42
942
+ name: Dot Accuracy@1
943
+ - type: dot_accuracy@3
944
+ value: 0.6
945
+ name: Dot Accuracy@3
946
+ - type: dot_accuracy@5
947
+ value: 0.64
948
+ name: Dot Accuracy@5
949
+ - type: dot_accuracy@10
950
+ value: 0.76
951
+ name: Dot Accuracy@10
952
+ - type: dot_precision@1
953
+ value: 0.42
954
+ name: Dot Precision@1
955
+ - type: dot_precision@3
956
+ value: 0.2866666666666667
957
+ name: Dot Precision@3
958
+ - type: dot_precision@5
959
+ value: 0.21999999999999997
960
+ name: Dot Precision@5
961
+ - type: dot_precision@10
962
+ value: 0.152
963
+ name: Dot Precision@10
964
+ - type: dot_recall@1
965
+ value: 0.086
966
+ name: Dot Recall@1
967
+ - type: dot_recall@3
968
+ value: 0.17666666666666664
969
+ name: Dot Recall@3
970
+ - type: dot_recall@5
971
+ value: 0.22466666666666665
972
+ name: Dot Recall@5
973
+ - type: dot_recall@10
974
+ value: 0.3116666666666667
975
+ name: Dot Recall@10
976
+ - type: dot_ndcg@10
977
+ value: 0.31329169156104253
978
+ name: Dot Ndcg@10
979
+ - type: dot_mrr@10
980
+ value: 0.5258253968253969
981
+ name: Dot Mrr@10
982
+ - type: dot_map@100
983
+ value: 0.24015404586272074
984
+ name: Dot Map@100
985
+ - type: query_active_dims
986
+ value: 38.599998474121094
987
+ name: Query Active Dims
988
+ - type: query_sparsity_ratio
989
+ value: 0.9987353384943936
990
+ name: Query Sparsity Ratio
991
+ - type: corpus_active_dims
992
+ value: 120.28081512451172
993
+ name: Corpus Active Dims
994
+ - type: corpus_sparsity_ratio
995
+ value: 0.9960592092548158
996
+ name: Corpus Sparsity Ratio
997
+ - task:
998
+ type: sparse-information-retrieval
999
+ name: Sparse Information Retrieval
1000
+ dataset:
1001
+ name: NanoArguAna
1002
+ type: NanoArguAna
1003
+ metrics:
1004
+ - type: dot_accuracy@1
1005
+ value: 0.1
1006
+ name: Dot Accuracy@1
1007
+ - type: dot_accuracy@3
1008
+ value: 0.34
1009
+ name: Dot Accuracy@3
1010
+ - type: dot_accuracy@5
1011
+ value: 0.46
1012
+ name: Dot Accuracy@5
1013
+ - type: dot_accuracy@10
1014
+ value: 0.66
1015
+ name: Dot Accuracy@10
1016
+ - type: dot_precision@1
1017
+ value: 0.1
1018
+ name: Dot Precision@1
1019
+ - type: dot_precision@3
1020
+ value: 0.11333333333333333
1021
+ name: Dot Precision@3
1022
+ - type: dot_precision@5
1023
+ value: 0.09200000000000001
1024
+ name: Dot Precision@5
1025
+ - type: dot_precision@10
1026
+ value: 0.06600000000000002
1027
+ name: Dot Precision@10
1028
+ - type: dot_recall@1
1029
+ value: 0.1
1030
+ name: Dot Recall@1
1031
+ - type: dot_recall@3
1032
+ value: 0.34
1033
+ name: Dot Recall@3
1034
+ - type: dot_recall@5
1035
+ value: 0.46
1036
+ name: Dot Recall@5
1037
+ - type: dot_recall@10
1038
+ value: 0.66
1039
+ name: Dot Recall@10
1040
+ - type: dot_ndcg@10
1041
+ value: 0.35624387960476495
1042
+ name: Dot Ndcg@10
1043
+ - type: dot_mrr@10
1044
+ value: 0.2620238095238095
1045
+ name: Dot Mrr@10
1046
+ - type: dot_map@100
1047
+ value: 0.27408886435627244
1048
+ name: Dot Map@100
1049
+ - type: query_active_dims
1050
+ value: 121.0199966430664
1051
+ name: Query Active Dims
1052
+ - type: query_sparsity_ratio
1053
+ value: 0.9960349912639058
1054
+ name: Query Sparsity Ratio
1055
+ - type: corpus_active_dims
1056
+ value: 107.16836547851562
1057
+ name: Corpus Active Dims
1058
+ - type: corpus_sparsity_ratio
1059
+ value: 0.9964888157565521
1060
+ name: Corpus Sparsity Ratio
1061
+ - task:
1062
+ type: sparse-information-retrieval
1063
+ name: Sparse Information Retrieval
1064
+ dataset:
1065
+ name: NanoSciFact
1066
+ type: NanoSciFact
1067
+ metrics:
1068
+ - type: dot_accuracy@1
1069
+ value: 0.6
1070
+ name: Dot Accuracy@1
1071
+ - type: dot_accuracy@3
1072
+ value: 0.72
1073
+ name: Dot Accuracy@3
1074
+ - type: dot_accuracy@5
1075
+ value: 0.72
1076
+ name: Dot Accuracy@5
1077
+ - type: dot_accuracy@10
1078
+ value: 0.78
1079
+ name: Dot Accuracy@10
1080
+ - type: dot_precision@1
1081
+ value: 0.6
1082
+ name: Dot Precision@1
1083
+ - type: dot_precision@3
1084
+ value: 0.24666666666666665
1085
+ name: Dot Precision@3
1086
+ - type: dot_precision@5
1087
+ value: 0.16399999999999998
1088
+ name: Dot Precision@5
1089
+ - type: dot_precision@10
1090
+ value: 0.088
1091
+ name: Dot Precision@10
1092
+ - type: dot_recall@1
1093
+ value: 0.565
1094
+ name: Dot Recall@1
1095
+ - type: dot_recall@3
1096
+ value: 0.68
1097
+ name: Dot Recall@3
1098
+ - type: dot_recall@5
1099
+ value: 0.71
1100
+ name: Dot Recall@5
1101
+ - type: dot_recall@10
1102
+ value: 0.77
1103
+ name: Dot Recall@10
1104
+ - type: dot_ndcg@10
1105
+ value: 0.6798182226611048
1106
+ name: Dot Ndcg@10
1107
+ - type: dot_mrr@10
1108
+ value: 0.6625
1109
+ name: Dot Mrr@10
1110
+ - type: dot_map@100
1111
+ value: 0.6532896014216637
1112
+ name: Dot Map@100
1113
+ - type: query_active_dims
1114
+ value: 57.41999816894531
1115
+ name: Query Active Dims
1116
+ - type: query_sparsity_ratio
1117
+ value: 0.9981187340879056
1118
+ name: Query Sparsity Ratio
1119
+ - type: corpus_active_dims
1120
+ value: 158.03323364257812
1121
+ name: Corpus Active Dims
1122
+ - type: corpus_sparsity_ratio
1123
+ value: 0.9948223172255234
1124
+ name: Corpus Sparsity Ratio
1125
+ - task:
1126
+ type: sparse-information-retrieval
1127
+ name: Sparse Information Retrieval
1128
+ dataset:
1129
+ name: NanoTouche2020
1130
+ type: NanoTouche2020
1131
+ metrics:
1132
+ - type: dot_accuracy@1
1133
+ value: 0.673469387755102
1134
+ name: Dot Accuracy@1
1135
+ - type: dot_accuracy@3
1136
+ value: 0.9591836734693877
1137
+ name: Dot Accuracy@3
1138
+ - type: dot_accuracy@5
1139
+ value: 0.9795918367346939
1140
+ name: Dot Accuracy@5
1141
+ - type: dot_accuracy@10
1142
+ value: 1.0
1143
+ name: Dot Accuracy@10
1144
+ - type: dot_precision@1
1145
+ value: 0.673469387755102
1146
+ name: Dot Precision@1
1147
+ - type: dot_precision@3
1148
+ value: 0.6666666666666667
1149
+ name: Dot Precision@3
1150
+ - type: dot_precision@5
1151
+ value: 0.5918367346938777
1152
+ name: Dot Precision@5
1153
+ - type: dot_precision@10
1154
+ value: 0.4836734693877551
1155
+ name: Dot Precision@10
1156
+ - type: dot_recall@1
1157
+ value: 0.04710668568549065
1158
+ name: Dot Recall@1
1159
+ - type: dot_recall@3
1160
+ value: 0.13289821324817133
1161
+ name: Dot Recall@3
1162
+ - type: dot_recall@5
1163
+ value: 0.20161215990326012
1164
+ name: Dot Recall@5
1165
+ - type: dot_recall@10
1166
+ value: 0.3205651054850781
1167
+ name: Dot Recall@10
1168
+ - type: dot_ndcg@10
1169
+ value: 0.5525193350177682
1170
+ name: Dot Ndcg@10
1171
+ - type: dot_mrr@10
1172
+ value: 0.814139941690962
1173
+ name: Dot Mrr@10
1174
+ - type: dot_map@100
1175
+ value: 0.40124972048901353
1176
+ name: Dot Map@100
1177
+ - type: query_active_dims
1178
+ value: 18.12244987487793
1179
+ name: Query Active Dims
1180
+ - type: query_sparsity_ratio
1181
+ value: 0.9994062495945587
1182
+ name: Query Sparsity Ratio
1183
+ - type: corpus_active_dims
1184
+ value: 84.7328109741211
1185
+ name: Corpus Active Dims
1186
+ - type: corpus_sparsity_ratio
1187
+ value: 0.9972238774990461
1188
+ name: Corpus Sparsity Ratio
1189
+ ---
1190
+
1191
+ # splade-distilbert-base-uncased trained on MS MARCO triplets
1192
+
1193
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
1194
+ ## Model Details
1195
+
1196
+ ### Model Description
1197
+ - **Model Type:** SPLADE Sparse Encoder
1198
+ - **Base model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) <!-- at revision 12040accade4e8a0f71eabdb258fecc2e7e948be -->
1199
+ - **Maximum Sequence Length:** 256 tokens
1200
+ - **Output Dimensionality:** 30522 dimensions
1201
+ - **Similarity Function:** Dot Product
1202
+ - **Training Dataset:**
1203
+ - [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco)
1204
+ - **Language:** en
1205
+ - **License:** apache-2.0
1206
+
1207
+ ### Model Sources
1208
+
1209
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
1210
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
1211
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
1212
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
1213
+
1214
+ ### Full Model Architecture
1215
+
1216
+ ```
1217
+ SparseEncoder(
1218
+ (0): MLMTransformer({'max_seq_length': 256, 'do_lower_case': False}) with MLMTransformer model: DistilBertForMaskedLM
1219
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
1220
+ )
1221
+ ```
1222
+
1223
+ ## Usage
1224
+
1225
+ ### Direct Usage (Sentence Transformers)
1226
+
1227
+ First install the Sentence Transformers library:
1228
+
1229
+ ```bash
1230
+ pip install -U sentence-transformers
1231
+ ```
1232
+
1233
+ Then you can load this model and run inference.
1234
+ ```python
1235
+ from sentence_transformers import SparseEncoder
1236
+
1237
+ # Download from the 🤗 Hub
1238
+ model = SparseEncoder("arthurbresnu/splade-distilbert-base-uncased-msmarco-mrl")
1239
+ # Run inference
1240
+ queries = [
1241
+ "meaning of the name bernard",
1242
+ ]
1243
+ documents = [
1244
+ 'English Meaning: The name Bernard is an English baby name. In English the meaning of the name Bernard is: Strong as a bear. See also Bjorn. American Meaning: The name Bernard is an American baby name. In American the meaning of the name Bernard is: Strong as a bear.',
1245
+ 'To the Citizens of St. Bernard We chose as our motto a simple but profound declaration: â\x80\x9cWelcome to your office.â\x80\x9d Those words remind us that we are no more than the caretakers of the office of Clerk of Court for the Parish of St. Bernard.',
1246
+ "Get Your Prior Years Tax Information from the IRS. IRS Tax Tip 2012-18, January 27, 2012. Sometimes taxpayers need a copy of an old tax return, but can't find or don't have their own records. There are three easy and convenient options for getting tax return transcripts and tax account transcripts from the IRS: on the web, by phone or by mail.",
1247
+ ]
1248
+ query_embeddings = model.encode_query(queries)
1249
+ document_embeddings = model.encode_document(documents)
1250
+ print(query_embeddings.shape, document_embeddings.shape)
1251
+ # [1, 30522] [3, 30522]
1252
+
1253
+ # Get the similarity scores for the embeddings
1254
+ similarities = model.similarity(query_embeddings, document_embeddings)
1255
+ print(similarities)
1256
+ # tensor([[18.6221, 10.0646, 0.0000]])
1257
+ ```
1258
+
1259
+ <!--
1260
+ ### Direct Usage (Transformers)
1261
+
1262
+ <details><summary>Click to see the direct usage in Transformers</summary>
1263
+
1264
+ </details>
1265
+ -->
1266
+
1267
+ <!--
1268
+ ### Downstream Usage (Sentence Transformers)
1269
+
1270
+ You can finetune this model on your own dataset.
1271
+
1272
+ <details><summary>Click to expand</summary>
1273
+
1274
+ </details>
1275
+ -->
1276
+
1277
+ <!--
1278
+ ### Out-of-Scope Use
1279
+
1280
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
1281
+ -->
1282
+
1283
+ ## Evaluation
1284
+
1285
+ ### Metrics
1286
+
1287
+ #### Sparse Information Retrieval
1288
+
1289
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus`, `NanoNQ`, `NanoClimateFEVER`, `NanoDBPedia`, `NanoFEVER`, `NanoFiQA2018`, `NanoHotpotQA`, `NanoMSMARCO`, `NanoNFCorpus`, `NanoNQ`, `NanoQuoraRetrieval`, `NanoSCIDOCS`, `NanoArguAna`, `NanoSciFact` and `NanoTouche2020`
1290
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
1291
+
1292
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoQuoraRetrieval | NanoSCIDOCS | NanoArguAna | NanoSciFact | NanoTouche2020 |
1293
+ |:----------------------|:------------|:-------------|:-----------|:-----------------|:------------|:-----------|:-------------|:-------------|:-------------------|:------------|:------------|:------------|:---------------|
1294
+ | dot_accuracy@1 | 0.44 | 0.36 | 0.48 | 0.24 | 0.7 | 0.74 | 0.34 | 0.88 | 0.84 | 0.42 | 0.1 | 0.6 | 0.6735 |
1295
+ | dot_accuracy@3 | 0.6 | 0.46 | 0.68 | 0.42 | 0.82 | 0.9 | 0.5 | 0.92 | 0.92 | 0.6 | 0.34 | 0.72 | 0.9592 |
1296
+ | dot_accuracy@5 | 0.74 | 0.54 | 0.74 | 0.56 | 0.88 | 0.92 | 0.58 | 0.94 | 0.94 | 0.64 | 0.46 | 0.72 | 0.9796 |
1297
+ | dot_accuracy@10 | 0.84 | 0.68 | 0.76 | 0.64 | 0.92 | 0.98 | 0.68 | 0.96 | 0.96 | 0.76 | 0.66 | 0.78 | 1.0 |
1298
+ | dot_precision@1 | 0.44 | 0.36 | 0.48 | 0.24 | 0.7 | 0.74 | 0.34 | 0.88 | 0.84 | 0.42 | 0.1 | 0.6 | 0.6735 |
1299
+ | dot_precision@3 | 0.2 | 0.34 | 0.2267 | 0.1467 | 0.6133 | 0.3133 | 0.2133 | 0.4867 | 0.3267 | 0.2867 | 0.1133 | 0.2467 | 0.6667 |
1300
+ | dot_precision@5 | 0.148 | 0.328 | 0.152 | 0.12 | 0.58 | 0.196 | 0.176 | 0.324 | 0.22 | 0.22 | 0.092 | 0.164 | 0.5918 |
1301
+ | dot_precision@10 | 0.084 | 0.27 | 0.08 | 0.074 | 0.52 | 0.104 | 0.112 | 0.17 | 0.12 | 0.152 | 0.066 | 0.088 | 0.4837 |
1302
+ | dot_recall@1 | 0.44 | 0.0208 | 0.47 | 0.1183 | 0.0531 | 0.7067 | 0.1771 | 0.44 | 0.7873 | 0.086 | 0.1 | 0.565 | 0.0471 |
1303
+ | dot_recall@3 | 0.6 | 0.0706 | 0.64 | 0.2117 | 0.1639 | 0.8667 | 0.307 | 0.73 | 0.854 | 0.1767 | 0.34 | 0.68 | 0.1329 |
1304
+ | dot_recall@5 | 0.74 | 0.0906 | 0.7 | 0.2623 | 0.2366 | 0.8933 | 0.3937 | 0.81 | 0.898 | 0.2247 | 0.46 | 0.71 | 0.2016 |
1305
+ | dot_recall@10 | 0.84 | 0.144 | 0.73 | 0.2997 | 0.3544 | 0.9433 | 0.4867 | 0.85 | 0.9313 | 0.3117 | 0.66 | 0.77 | 0.3206 |
1306
+ | **dot_ndcg@10** | **0.6242** | **0.3196** | **0.6151** | **0.2571** | **0.6138** | **0.8368** | **0.3902** | **0.8078** | **0.8841** | **0.3133** | **0.3562** | **0.6798** | **0.5525** |
1307
+ | dot_mrr@10 | 0.5571 | 0.4414 | 0.5865 | 0.3586 | 0.7719 | 0.817 | 0.4439 | 0.9042 | 0.8806 | 0.5258 | 0.262 | 0.6625 | 0.8141 |
1308
+ | dot_map@100 | 0.5639 | 0.1357 | 0.5841 | 0.2046 | 0.4605 | 0.7994 | 0.3267 | 0.7447 | 0.8626 | 0.2402 | 0.2741 | 0.6533 | 0.4012 |
1309
+ | query_active_dims | 20.5 | 18.3 | 22.2 | 51.48 | 20.52 | 44.84 | 18.92 | 43.88 | 18.76 | 38.6 | 121.02 | 57.42 | 18.1224 |
1310
+ | query_sparsity_ratio | 0.9993 | 0.9994 | 0.9993 | 0.9983 | 0.9993 | 0.9985 | 0.9994 | 0.9986 | 0.9994 | 0.9987 | 0.996 | 0.9981 | 0.9994 |
1311
+ | corpus_active_dims | 81.8767 | 156.0484 | 103.7253 | 134.299 | 111.0784 | 154.0977 | 75.4999 | 120.7884 | 20.3819 | 120.2808 | 107.1684 | 158.0332 | 84.7328 |
1312
+ | corpus_sparsity_ratio | 0.9973 | 0.9949 | 0.9966 | 0.9956 | 0.9964 | 0.995 | 0.9975 | 0.996 | 0.9993 | 0.9961 | 0.9965 | 0.9948 | 0.9972 |
1313
+
1314
+ #### Sparse Nano BEIR
1315
+
1316
+ * Dataset: `NanoBEIR_mean`
1317
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
1318
+ ```json
1319
+ {
1320
+ "dataset_names": [
1321
+ "msmarco",
1322
+ "nfcorpus",
1323
+ "nq"
1324
+ ]
1325
+ }
1326
+ ```
1327
+
1328
+ | Metric | Value |
1329
+ |:----------------------|:-----------|
1330
+ | dot_accuracy@1 | 0.44 |
1331
+ | dot_accuracy@3 | 0.62 |
1332
+ | dot_accuracy@5 | 0.66 |
1333
+ | dot_accuracy@10 | 0.7467 |
1334
+ | dot_precision@1 | 0.44 |
1335
+ | dot_precision@3 | 0.2711 |
1336
+ | dot_precision@5 | 0.2067 |
1337
+ | dot_precision@10 | 0.1447 |
1338
+ | dot_recall@1 | 0.3078 |
1339
+ | dot_recall@3 | 0.4617 |
1340
+ | dot_recall@5 | 0.4975 |
1341
+ | dot_recall@10 | 0.5604 |
1342
+ | **dot_ndcg@10** | **0.5189** |
1343
+ | dot_mrr@10 | 0.5385 |
1344
+ | dot_map@100 | 0.4255 |
1345
+ | query_active_dims | 22.4 |
1346
+ | query_sparsity_ratio | 0.9993 |
1347
+ | corpus_active_dims | 112.0335 |
1348
+ | corpus_sparsity_ratio | 0.9963 |
1349
+
1350
+ #### Sparse Nano BEIR
1351
+
1352
+ * Dataset: `NanoBEIR_mean`
1353
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
1354
+ ```json
1355
+ {
1356
+ "dataset_names": [
1357
+ "climatefever",
1358
+ "dbpedia",
1359
+ "fever",
1360
+ "fiqa2018",
1361
+ "hotpotqa",
1362
+ "msmarco",
1363
+ "nfcorpus",
1364
+ "nq",
1365
+ "quoraretrieval",
1366
+ "scidocs",
1367
+ "arguana",
1368
+ "scifact",
1369
+ "touche2020"
1370
+ ]
1371
+ }
1372
+ ```
1373
+
1374
+ | Metric | Value |
1375
+ |:----------------------|:-----------|
1376
+ | dot_accuracy@1 | 0.5241 |
1377
+ | dot_accuracy@3 | 0.6799 |
1378
+ | dot_accuracy@5 | 0.7415 |
1379
+ | dot_accuracy@10 | 0.8169 |
1380
+ | dot_precision@1 | 0.5241 |
1381
+ | dot_precision@3 | 0.3215 |
1382
+ | dot_precision@5 | 0.2548 |
1383
+ | dot_precision@10 | 0.1787 |
1384
+ | dot_recall@1 | 0.3086 |
1385
+ | dot_recall@3 | 0.4441 |
1386
+ | dot_recall@5 | 0.5093 |
1387
+ | dot_recall@10 | 0.5878 |
1388
+ | **dot_ndcg@10** | **0.5577** |
1389
+ | dot_mrr@10 | 0.6174 |
1390
+ | dot_map@100 | 0.4808 |
1391
+ | query_active_dims | 38.074 |
1392
+ | query_sparsity_ratio | 0.9988 |
1393
+ | corpus_active_dims | 105.0515 |
1394
+ | corpus_sparsity_ratio | 0.9966 |
1395
+
1396
+ <!--
1397
+ ## Bias, Risks and Limitations
1398
+
1399
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
1400
+ -->
1401
+
1402
+ <!--
1403
+ ### Recommendations
1404
+
1405
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
1406
+ -->
1407
+
1408
+ ## Training Details
1409
+
1410
+ ### Training Dataset
1411
+
1412
+ #### msmarco
1413
+
1414
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
1415
+ * Size: 90,000 training samples
1416
+ * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
1417
+ * Approximate statistics based on the first 1000 samples:
1418
+ | | query | positive | negative |
1419
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
1420
+ | type | string | string | string |
1421
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.02 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 79.88 tokens</li><li>max: 203 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 77.8 tokens</li><li>max: 201 tokens</li></ul> |
1422
+ * Samples:
1423
+ | query | positive | negative |
1424
+ |:--------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1425
+ | <code>yosemite temperature in september</code> | <code>Here are the average temp in Yosemite Valley (where CV is located) by month: www.nps.gov/yose/planyourvisit/climate.htm. Also beginning of September is usually still quite warm. Nights can have a bit of a chill, but nothing a couple of blankets can't handle.</code> | <code>Guide to Switzerland weather in September. The average maximum daytime temperature in Switzerland in September is a comfortable 18°C (64°F). The average night-time temperature is usually a cool 9°C (48°F). There are usually 6 hours of bright sunshine each day, which represents 45% of the 13 hours of daylight.</code> |
1426
+ | <code>what is genus</code> | <code>Intermediate minor rankings are not shown. A genus (/ˈdʒiːnəs/, pl. genera) is a taxonomic rank used in the biological classification of living and fossil organisms in biology. In the hierarchy of biological classification, genus comes above species and below family. In binomial nomenclature, the genus name forms the first part of the binomial species name for each species within the genus. The composition of a genus is determined by a taxonomist.</code> | <code>The genus is the first part of a scientific name. Note that the genus is always capitalised. An example: Lemur catta is the scientific name of the Ringtailed lemur and Lemur … is the genus.Another example: Sphyrna zygaena is the scientific name of one species of Hammerhead shark and Sphyrna is the genus. name used all around the world to classify a living organism. It is composed of a genus and species name. A sceintific name can also be considered for non living things, the … se are usually called scientific jargon, or very simply 'proper names for the things around you'. 4 people found this useful.</code> |
1427
+ | <code>what did johannes kepler discover about the motion of the planets?</code> | <code>Johannes Kepler devised his three laws of motion from his observations of planets that are fundamental to our understanding of orbital motions.</code> | <code>Little Street, Johannes Vermeer, c. 1658. New stop on Delft tourist trail after Vermeer's Little Street identified. Few artists have left such a deep imprint on their birthplace as Johannes Vermeer on Delft. In the summer, tour parties weave through the Dutch town’s cobbled streets ticking off Vermeer landmarks.</code> |
1428
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
1429
+ ```json
1430
+ {
1431
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
1432
+ "lambda_corpus": 0.001,
1433
+ "lambda_query": 5e-05
1434
+ }
1435
+ ```
1436
+
1437
+ ### Evaluation Dataset
1438
+
1439
+ #### msmarco
1440
+
1441
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
1442
+ * Size: 10,000 evaluation samples
1443
+ * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
1444
+ * Approximate statistics based on the first 1000 samples:
1445
+ | | query | positive | negative |
1446
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
1447
+ | type | string | string | string |
1448
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.16 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 79.89 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 76.95 tokens</li><li>max: 220 tokens</li></ul> |
1449
+ * Samples:
1450
+ | query | positive | negative |
1451
+ |:---------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1452
+ | <code>scarehouse cast</code> | <code>The Scarehouse. The Scarehouse is a 2014 Canadian horror film directed by Gavin Michael Booth. It stars Sarah Booth and Kimberly-Sue Murray as two women who seek revenge against their former sorority.</code> | <code>Nathalie Emmanuel joined the TV series as a recurring cast member in Season 3, and continued as a recurring cast member into Season 4. Emmanuel was later promoted to a starring cast member for seasons 5 and 6.</code> |
1453
+ | <code>population of bellemont arizona</code> | <code>The 2016 Bellemont (zip 86015), Arizona, population is 300. There are 55 people per square mile (population density). The median age is 29.9. The US median is 37.4. 38.19% of people in Bellemont (zip 86015), Arizona, are married.</code> | <code>• Arizona: A 2010 University of Arizona report estimates that 40% of the state's kissing bugs carry a parasite strain related to the Chagas disease but rarely transmit the disease to humans. The Arizona Department of Health Services reported one Chagas disease-related death in 2013, reports The Arizona Republic.</code> |
1454
+ | <code>does air transat check bag size</code> | <code>• Weight must be 10kg (22 lb) in Economy class and in Option Plus and 15 kg (33lb) in Club Class. Checked Baggage Air Transat allows for multiple pieces, as long as the combined weight does not exceed weight limitations. • Length + width + height must not exceed 158cm (62 in).</code> | <code>Bag-valve masks come in different sizes to fit infants, children, and adults. The face mask size may be independent of the bag size; for example, a single pediatric-sized bag might be used with different masks for multiple face sizes, or a pediatric mask might be used with an adult bag for patients with small faces.</code> |
1455
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
1456
+ ```json
1457
+ {
1458
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
1459
+ "lambda_corpus": 0.001,
1460
+ "lambda_query": 5e-05
1461
+ }
1462
+ ```
1463
+
1464
+ ### Training Hyperparameters
1465
+ #### Non-Default Hyperparameters
1466
+
1467
+ - `eval_strategy`: steps
1468
+ - `per_device_train_batch_size`: 16
1469
+ - `per_device_eval_batch_size`: 16
1470
+ - `learning_rate`: 2e-05
1471
+ - `num_train_epochs`: 1
1472
+ - `warmup_ratio`: 0.1
1473
+ - `bf16`: True
1474
+ - `load_best_model_at_end`: True
1475
+ - `batch_sampler`: no_duplicates
1476
+
1477
+ #### All Hyperparameters
1478
+ <details><summary>Click to expand</summary>
1479
+
1480
+ - `overwrite_output_dir`: False
1481
+ - `do_predict`: False
1482
+ - `eval_strategy`: steps
1483
+ - `prediction_loss_only`: True
1484
+ - `per_device_train_batch_size`: 16
1485
+ - `per_device_eval_batch_size`: 16
1486
+ - `per_gpu_train_batch_size`: None
1487
+ - `per_gpu_eval_batch_size`: None
1488
+ - `gradient_accumulation_steps`: 1
1489
+ - `eval_accumulation_steps`: None
1490
+ - `torch_empty_cache_steps`: None
1491
+ - `learning_rate`: 2e-05
1492
+ - `weight_decay`: 0.0
1493
+ - `adam_beta1`: 0.9
1494
+ - `adam_beta2`: 0.999
1495
+ - `adam_epsilon`: 1e-08
1496
+ - `max_grad_norm`: 1.0
1497
+ - `num_train_epochs`: 1
1498
+ - `max_steps`: -1
1499
+ - `lr_scheduler_type`: linear
1500
+ - `lr_scheduler_kwargs`: {}
1501
+ - `warmup_ratio`: 0.1
1502
+ - `warmup_steps`: 0
1503
+ - `log_level`: passive
1504
+ - `log_level_replica`: warning
1505
+ - `log_on_each_node`: True
1506
+ - `logging_nan_inf_filter`: True
1507
+ - `save_safetensors`: True
1508
+ - `save_on_each_node`: False
1509
+ - `save_only_model`: False
1510
+ - `restore_callback_states_from_checkpoint`: False
1511
+ - `no_cuda`: False
1512
+ - `use_cpu`: False
1513
+ - `use_mps_device`: False
1514
+ - `seed`: 42
1515
+ - `data_seed`: None
1516
+ - `jit_mode_eval`: False
1517
+ - `use_ipex`: False
1518
+ - `bf16`: True
1519
+ - `fp16`: False
1520
+ - `fp16_opt_level`: O1
1521
+ - `half_precision_backend`: auto
1522
+ - `bf16_full_eval`: False
1523
+ - `fp16_full_eval`: False
1524
+ - `tf32`: None
1525
+ - `local_rank`: 0
1526
+ - `ddp_backend`: None
1527
+ - `tpu_num_cores`: None
1528
+ - `tpu_metrics_debug`: False
1529
+ - `debug`: []
1530
+ - `dataloader_drop_last`: False
1531
+ - `dataloader_num_workers`: 0
1532
+ - `dataloader_prefetch_factor`: None
1533
+ - `past_index`: -1
1534
+ - `disable_tqdm`: False
1535
+ - `remove_unused_columns`: True
1536
+ - `label_names`: None
1537
+ - `load_best_model_at_end`: True
1538
+ - `ignore_data_skip`: False
1539
+ - `fsdp`: []
1540
+ - `fsdp_min_num_params`: 0
1541
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
1542
+ - `tp_size`: 0
1543
+ - `fsdp_transformer_layer_cls_to_wrap`: None
1544
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
1545
+ - `deepspeed`: None
1546
+ - `label_smoothing_factor`: 0.0
1547
+ - `optim`: adamw_torch
1548
+ - `optim_args`: None
1549
+ - `adafactor`: False
1550
+ - `group_by_length`: False
1551
+ - `length_column_name`: length
1552
+ - `ddp_find_unused_parameters`: None
1553
+ - `ddp_bucket_cap_mb`: None
1554
+ - `ddp_broadcast_buffers`: False
1555
+ - `dataloader_pin_memory`: True
1556
+ - `dataloader_persistent_workers`: False
1557
+ - `skip_memory_metrics`: True
1558
+ - `use_legacy_prediction_loop`: False
1559
+ - `push_to_hub`: False
1560
+ - `resume_from_checkpoint`: None
1561
+ - `hub_model_id`: None
1562
+ - `hub_strategy`: every_save
1563
+ - `hub_private_repo`: None
1564
+ - `hub_always_push`: False
1565
+ - `gradient_checkpointing`: False
1566
+ - `gradient_checkpointing_kwargs`: None
1567
+ - `include_inputs_for_metrics`: False
1568
+ - `include_for_metrics`: []
1569
+ - `eval_do_concat_batches`: True
1570
+ - `fp16_backend`: auto
1571
+ - `push_to_hub_model_id`: None
1572
+ - `push_to_hub_organization`: None
1573
+ - `mp_parameters`:
1574
+ - `auto_find_batch_size`: False
1575
+ - `full_determinism`: False
1576
+ - `torchdynamo`: None
1577
+ - `ray_scope`: last
1578
+ - `ddp_timeout`: 1800
1579
+ - `torch_compile`: False
1580
+ - `torch_compile_backend`: None
1581
+ - `torch_compile_mode`: None
1582
+ - `include_tokens_per_second`: False
1583
+ - `include_num_input_tokens_seen`: False
1584
+ - `neftune_noise_alpha`: None
1585
+ - `optim_target_modules`: None
1586
+ - `batch_eval_metrics`: False
1587
+ - `eval_on_start`: False
1588
+ - `use_liger_kernel`: False
1589
+ - `eval_use_gather_object`: False
1590
+ - `average_tokens_across_devices`: False
1591
+ - `prompts`: None
1592
+ - `batch_sampler`: no_duplicates
1593
+ - `multi_dataset_batch_sampler`: proportional
1594
+ - `router_mapping`: {}
1595
+ - `learning_rate_mapping`: {}
1596
+
1597
+ </details>
1598
+
1599
+ ### Training Logs
1600
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 | NanoClimateFEVER_dot_ndcg@10 | NanoDBPedia_dot_ndcg@10 | NanoFEVER_dot_ndcg@10 | NanoFiQA2018_dot_ndcg@10 | NanoHotpotQA_dot_ndcg@10 | NanoQuoraRetrieval_dot_ndcg@10 | NanoSCIDOCS_dot_ndcg@10 | NanoArguAna_dot_ndcg@10 | NanoSciFact_dot_ndcg@10 | NanoTouche2020_dot_ndcg@10 |
1601
+ |:----------:|:--------:|:-------------:|:---------------:|:-----------------------:|:------------------------:|:------------------:|:-------------------------:|:----------------------------:|:-----------------------:|:---------------------:|:------------------------:|:------------------------:|:------------------------------:|:-----------------------:|:-----------------------:|:-----------------------:|:--------------------------:|
1602
+ | 0.0178 | 100 | 199.0423 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1603
+ | 0.0356 | 200 | 11.3558 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1604
+ | 0.0533 | 300 | 0.9845 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1605
+ | 0.0711 | 400 | 0.4726 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1606
+ | 0.0889 | 500 | 0.2639 | 0.2407 | 0.5514 | 0.3061 | 0.5649 | 0.4741 | - | - | - | - | - | - | - | - | - | - |
1607
+ | 0.1067 | 600 | 0.2931 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1608
+ | 0.1244 | 700 | 0.2301 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1609
+ | 0.1422 | 800 | 0.2168 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1610
+ | 0.16 | 900 | 0.1741 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1611
+ | 0.1778 | 1000 | 0.1852 | 0.1878 | 0.5868 | 0.2975 | 0.5648 | 0.4830 | - | - | - | - | - | - | - | - | - | - |
1612
+ | 0.1956 | 1100 | 0.1684 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1613
+ | 0.2133 | 1200 | 0.1629 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1614
+ | 0.2311 | 1300 | 0.1736 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1615
+ | 0.2489 | 1400 | 0.1813 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1616
+ | 0.2667 | 1500 | 0.1826 | 0.1382 | 0.5941 | 0.3251 | 0.5911 | 0.5035 | - | - | - | - | - | - | - | - | - | - |
1617
+ | 0.2844 | 1600 | 0.177 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1618
+ | 0.3022 | 1700 | 0.1568 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1619
+ | 0.32 | 1800 | 0.1707 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1620
+ | 0.3378 | 1900 | 0.1554 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1621
+ | 0.3556 | 2000 | 0.1643 | 0.1553 | 0.6157 | 0.2997 | 0.5807 | 0.4987 | - | - | - | - | - | - | - | - | - | - |
1622
+ | 0.3733 | 2100 | 0.1564 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1623
+ | 0.3911 | 2200 | 0.1334 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1624
+ | 0.4089 | 2300 | 0.1349 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1625
+ | 0.4267 | 2400 | 0.1228 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1626
+ | **0.4444** | **2500** | **0.1473** | **0.1239** | **0.6242** | **0.3196** | **0.6151** | **0.5196** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** |
1627
+ | 0.4622 | 2600 | 0.1506 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1628
+ | 0.48 | 2700 | 0.1436 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1629
+ | 0.4978 | 2800 | 0.1471 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1630
+ | 0.5156 | 2900 | 0.1378 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1631
+ | 0.5333 | 3000 | 0.1248 | 0.1328 | 0.6077 | 0.3073 | 0.6022 | 0.5057 | - | - | - | - | - | - | - | - | - | - |
1632
+ | 0.5511 | 3100 | 0.1672 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1633
+ | 0.5689 | 3200 | 0.1301 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1634
+ | 0.5867 | 3300 | 0.1325 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1635
+ | 0.6044 | 3400 | 0.1335 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1636
+ | 0.6222 | 3500 | 0.122 | 0.1163 | 0.6081 | 0.3302 | 0.6190 | 0.5191 | - | - | - | - | - | - | - | - | - | - |
1637
+ | 0.64 | 3600 | 0.1369 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1638
+ | 0.6578 | 3700 | 0.1651 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1639
+ | 0.6756 | 3800 | 0.1243 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1640
+ | 0.6933 | 3900 | 0.1122 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1641
+ | 0.7111 | 4000 | 0.1308 | 0.1307 | 0.6013 | 0.3232 | 0.5981 | 0.5075 | - | - | - | - | - | - | - | - | - | - |
1642
+ | 0.7289 | 4100 | 0.1708 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1643
+ | 0.7467 | 4200 | 0.1143 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1644
+ | 0.7644 | 4300 | 0.167 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1645
+ | 0.7822 | 4400 | 0.1119 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1646
+ | 0.8 | 4500 | 0.1128 | 0.1177 | 0.6082 | 0.3228 | 0.5866 | 0.5058 | - | - | - | - | - | - | - | - | - | - |
1647
+ | 0.8178 | 4600 | 0.125 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1648
+ | 0.8356 | 4700 | 0.1252 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1649
+ | 0.8533 | 4800 | 0.1066 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1650
+ | 0.8711 | 4900 | 0.1196 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1651
+ | 0.8889 | 5000 | 0.1291 | 0.1120 | 0.6134 | 0.3230 | 0.6115 | 0.5160 | - | - | - | - | - | - | - | - | - | - |
1652
+ | 0.9067 | 5100 | 0.1219 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1653
+ | 0.9244 | 5200 | 0.1492 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1654
+ | 0.9422 | 5300 | 0.1138 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1655
+ | 0.96 | 5400 | 0.1583 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1656
+ | 0.9778 | 5500 | 0.1516 | 0.1125 | 0.6224 | 0.3205 | 0.6137 | 0.5189 | - | - | - | - | - | - | - | - | - | - |
1657
+ | 0.9956 | 5600 | 0.1227 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1658
+ | -1 | -1 | - | - | 0.6242 | 0.3196 | 0.6151 | 0.5577 | 0.2571 | 0.6138 | 0.8368 | 0.3902 | 0.8078 | 0.8841 | 0.3133 | 0.3562 | 0.6798 | 0.5525 |
1659
+
1660
+ * The bold row denotes the saved checkpoint.
1661
+
1662
+ ### Environmental Impact
1663
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
1664
+ - **Energy Consumed**: 0.057 kWh
1665
+ - **Carbon Emitted**: 0.021 kg of CO2
1666
+ - **Hours Used**: 0.179 hours
1667
+
1668
+ ### Training Hardware
1669
+ - **On Cloud**: No
1670
+ - **GPU Model**: 1 x NVIDIA H100 80GB HBM3
1671
+ - **CPU Model**: AMD EPYC 7R13 Processor
1672
+ - **RAM Size**: 248.00 GB
1673
+
1674
+ ### Framework Versions
1675
+ - Python: 3.13.3
1676
+ - Sentence Transformers: 4.2.0.dev0
1677
+ - Transformers: 4.51.3
1678
+ - PyTorch: 2.7.1+cu126
1679
+ - Accelerate: 0.26.0
1680
+ - Datasets: 2.21.0
1681
+ - Tokenizers: 0.21.1
1682
+
1683
+ ## Citation
1684
+
1685
+ ### BibTeX
1686
+
1687
+ #### Sentence Transformers
1688
+ ```bibtex
1689
+ @inproceedings{reimers-2019-sentence-bert,
1690
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1691
+ author = "Reimers, Nils and Gurevych, Iryna",
1692
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1693
+ month = "11",
1694
+ year = "2019",
1695
+ publisher = "Association for Computational Linguistics",
1696
+ url = "https://arxiv.org/abs/1908.10084",
1697
+ }
1698
+ ```
1699
+
1700
+ #### SpladeLoss
1701
+ ```bibtex
1702
+ @misc{formal2022distillationhardnegativesampling,
1703
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
1704
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
1705
+ year={2022},
1706
+ eprint={2205.04733},
1707
+ archivePrefix={arXiv},
1708
+ primaryClass={cs.IR},
1709
+ url={https://arxiv.org/abs/2205.04733},
1710
+ }
1711
+ ```
1712
+
1713
+ #### SparseMultipleNegativesRankingLoss
1714
+ ```bibtex
1715
+ @misc{henderson2017efficient,
1716
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1717
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1718
+ year={2017},
1719
+ eprint={1705.00652},
1720
+ archivePrefix={arXiv},
1721
+ primaryClass={cs.CL}
1722
+ }
1723
+ ```
1724
+
1725
+ #### FlopsLoss
1726
+ ```bibtex
1727
+ @article{paria2020minimizing,
1728
+ title={Minimizing flops to learn efficient sparse representations},
1729
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
1730
+ journal={arXiv preprint arXiv:2004.05665},
1731
+ year={2020}
1732
+ }
1733
+ ```
1734
+
1735
+ <!--
1736
+ ## Glossary
1737
+
1738
+ *Clearly define terms in order to be accessible across audiences.*
1739
+ -->
1740
+
1741
+ <!--
1742
+ ## Model Card Authors
1743
+
1744
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1745
+ -->
1746
+
1747
+ <!--
1748
+ ## Model Card Contact
1749
+
1750
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1751
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForMaskedLM"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "hidden_dim": 3072,
10
+ "initializer_range": 0.02,
11
+ "max_position_embeddings": 512,
12
+ "model_type": "distilbert",
13
+ "n_heads": 12,
14
+ "n_layers": 6,
15
+ "pad_token_id": 0,
16
+ "qa_dropout": 0.1,
17
+ "seq_classif_dropout": 0.2,
18
+ "sinusoidal_pos_embds": false,
19
+ "tie_weights_": true,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.51.3",
22
+ "vocab_size": 30522
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "4.2.0.dev0",
5
+ "transformers": "4.51.3",
6
+ "pytorch": "2.7.1+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bc4cd95f2ea06397d14b8e8ea68a524c1a15febfc0bc82de99f06e19a64e02a
3
+ size 267954768
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff