michaeldinzinger commited on
Commit
3fb6444
·
1 Parent(s): 1db346a
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,437 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:2560000
8
+ - loss:MultipleNegativesRankingLoss
9
+ widget:
10
+ - source_sentence: ما هي أفضل الفنادق في ايبوهبالقرب من Ipoh Parade Shopping Centre؟
11
+ sentences:
12
+ - Bei ORION gibt es eine Sale-Rubrik, in der alle reduzierten Artikel zu finden
13
+ sind. Wenn du also auf der Suche nach einem Schnäppchen bist, weißt du, an welcher
14
+ Stelle auf der Webseite du fündig wirst. Der Sale umfasst viele verschiedene Produke.
15
+ Von Toys bis hin zu Dessous und Drogerieartikel - es spielt keine Rolle, wonach
16
+ du suchst. Aufgrund der Produktvielfalt ist die Chance, dass du im Sale den passenden
17
+ Gegenstand findest, groß.
18
+ - عادة ما يكون لأصحاب النفوذ الجزئي ما بين 10000 و 100000 متابع.
19
+ - المسافرون الموثّقون إلى مدينة ايبوه الذين أقاموا قرب Ipoh Parade Shopping Centre
20
+ أعطوا أعلى التقييمات لـفندق فايل ، Zone Hotel (Ipoh) وGolden Roof Hotel Ampang
21
+ Ipoh.
22
+ - source_sentence: Habe ich Vorteile, wenn ich früh in das Projekt einsteige?
23
+ sentences:
24
+ - نعم لدينا خصومات مُتعددة على جميع أعمال السواتر والمظلات والجلسات ففي فصل الشتاء
25
+ والصيف هناك خصومات مُتعددة في الأعياد والمناسبات
26
+ - Der Vorteil einer frühen Mitgliedschaft besteht in der Möglichkeit der Mitgestaltung
27
+ des Projektes. Alle später Hinzukommenden müssen die bis dahin getroffenen Entscheidungen
28
+ akzeptieren. Zudem entscheidet unter anderem auch das Eintrittsdatum in die eG
29
+ und das Engagement während des Projektverlaufes über die spätere Reihenfolge der
30
+ Vergabe der Wohnungen.
31
+ - Средняя оценка Registered от клиентов – 4 на основе 227 оценок и отзывов. Заходите
32
+ на сайт и прочитайте реальные отзывы о Registered.
33
+ - source_sentence: В какое время доступны ваши технические услуги?
34
+ sentences:
35
+ - Наша команда технической поддержки обеспечивает круглосуточное обслуживание в
36
+ случае чрезвычайных ситуаций. Вы можете связаться с нами в любой день недели,
37
+ в любое время суток и получить поддержку для ваших холодильных систем. Услуги
38
+ по плановому техническому обслуживанию и ремонту предоставляются в обычное рабочее
39
+ время, а услуги предоставляются в экстренных случаях, в том числе в ночное время
40
+ и в выходные дни.
41
+ - این سوال کاملا به علاقه و مهارت شما بستگی دارد. اگر به درس شیمی علاقه زیادی دارید
42
+ این رشته بهترین انتخاب برای تحصیل در دانشگاه برای شما محسوب می‌شود.
43
+ - 'eo光は10Gを提供している光回線です。
44
+
45
+ 提供エリアは、通常プランと変わらず関西地方と福井県です。しかし、一部の利用できないエリアもあるので契約前に確認しましょう。
46
+
47
+ 関連記事
48
+
49
+ eo光の10Gプランの評判口コミ'
50
+ - source_sentence: Supertotobet redtiger oyun çeşitleri hangileri?
51
+ sentences:
52
+ - Supertotobet redtiger oyunları arasında gold star, golden tsar, golden lotus,
53
+ blood suckers ve redtiger slot gibi çeşitli oyunlar vardır. Bu oyun seçeneklerini
54
+ kullanabilmek için oyunlar hakkında bilgi sahibi olmalısınız.
55
+ - Время выполнения проекта зависит от его сложности и размера. Обычно, время выполнения
56
+ проекта составляет несколько месяцев.
57
+ - 'Wer kein Homeoffice während des Coronavirus anbieten kann, ist dazu verpflichtet,
58
+ Schutzmaßnahmen zu ergreifen.
59
+
60
+ Die Arbeitsschutzbehörden der Länder sind befugt, Corona-Schutzmaßnahmen in Betrieben
61
+ zu kontrollieren und Fehlverhalten zu bestrafen. Bei Verstößen sind Bußgelder
62
+ in Höhe von bis zu 30.000 Euro möglich. Wiederholen sich schwere Verstöße, droht
63
+ den Verantwortlichen sogar bis zu einem Jahr Freiheitsstrafe.
64
+
65
+ Arbeitgeber, die die Vorschriften missachten, könnten zudem dafür haften, wenn
66
+ Mitarbeiter durch eine Corona-Infektion gesundheitliche Schäden erleiden.'
67
+ - source_sentence: Muss der Deckel der TipBox beim Autoklavieren geöffnet werden?
68
+ sentences:
69
+ - ВВП (валовый внутренний продукт) - это общая стоимость всех товаров и услуг, произведенных
70
+ в стране за определенный период времени. Он является ключевым экономическим показателем,
71
+ который отражает общий уровень экономической активности и роста. Инвесторы следят
72
+ за ВВП, чтобы оценить состояние и перспективы экономики, потенциал для роста и
73
+ возможности для инвестиций
74
+ - برآمدگی های بیضه ممکن است نشان دهنده مشکلی در بیضه ها باشد. ممکن است به دلیل صدمه
75
+ ای به وجود آمده یا ممکن است یک مشکل پزشکی جدی باشد.
76
+ - Nein, das ist nicht notwendig. Die neue TipBox kann bei 121°C im geschlossenen
77
+ Zustand autoklaviert werden.
78
+ pipeline_tag: sentence-similarity
79
+ library_name: sentence-transformers
80
+ ---
81
+
82
+ # SentenceTransformer
83
+
84
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
85
+
86
+ ## Model Details
87
+
88
+ ### Model Description
89
+ - **Model Type:** Sentence Transformer
90
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
91
+ - **Maximum Sequence Length:** 512 tokens
92
+ - **Output Dimensionality:** 768 dimensions
93
+ - **Similarity Function:** Cosine Similarity
94
+ <!-- - **Training Dataset:** Unknown -->
95
+ <!-- - **Language:** Unknown -->
96
+ <!-- - **License:** Unknown -->
97
+
98
+ ### Model Sources
99
+
100
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
101
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
102
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
103
+
104
+ ### Full Model Architecture
105
+
106
+ ```
107
+ SentenceTransformer(
108
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
109
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
110
+ )
111
+ ```
112
+
113
+ ## Usage
114
+
115
+ ### Direct Usage (Sentence Transformers)
116
+
117
+ First install the Sentence Transformers library:
118
+
119
+ ```bash
120
+ pip install -U sentence-transformers
121
+ ```
122
+
123
+ Then you can load this model and run inference.
124
+ ```python
125
+ from sentence_transformers import SentenceTransformer
126
+
127
+ # Download from the 🤗 Hub
128
+ model = SentenceTransformer("sentence_transformers_model_id")
129
+ # Run inference
130
+ sentences = [
131
+ 'Muss der Deckel der TipBox beim Autoklavieren geöffnet werden?',
132
+ 'Nein, das ist nicht notwendig. Die neue TipBox kann bei 121°C im geschlossenen Zustand autoklaviert werden.',
133
+ 'برآمدگی های بیضه ممکن است نشان دهنده مشکلی در بیضه ها باشد. ممکن است به دلیل صدمه ای به وجود آمده یا ممکن است یک مشکل پزشکی جدی باشد.',
134
+ ]
135
+ embeddings = model.encode(sentences)
136
+ print(embeddings.shape)
137
+ # [3, 768]
138
+
139
+ # Get the similarity scores for the embeddings
140
+ similarities = model.similarity(embeddings, embeddings)
141
+ print(similarities.shape)
142
+ # [3, 3]
143
+ ```
144
+
145
+ <!--
146
+ ### Direct Usage (Transformers)
147
+
148
+ <details><summary>Click to see the direct usage in Transformers</summary>
149
+
150
+ </details>
151
+ -->
152
+
153
+ <!--
154
+ ### Downstream Usage (Sentence Transformers)
155
+
156
+ You can finetune this model on your own dataset.
157
+
158
+ <details><summary>Click to expand</summary>
159
+
160
+ </details>
161
+ -->
162
+
163
+ <!--
164
+ ### Out-of-Scope Use
165
+
166
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
167
+ -->
168
+
169
+ <!--
170
+ ## Bias, Risks and Limitations
171
+
172
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
173
+ -->
174
+
175
+ <!--
176
+ ### Recommendations
177
+
178
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
179
+ -->
180
+
181
+ ## Training Details
182
+
183
+ ### Training Dataset
184
+
185
+ #### Unnamed Dataset
186
+
187
+ * Size: 2,560,000 training samples
188
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
189
+ * Approximate statistics based on the first 1000 samples:
190
+ | | sentence_0 | sentence_1 |
191
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
192
+ | type | string | string |
193
+ | details | <ul><li>min: 6 tokens</li><li>mean: 15.02 tokens</li><li>max: 86 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 66.82 tokens</li><li>max: 512 tokens</li></ul> |
194
+ * Samples:
195
+ | sentence_0 | sentence_1 |
196
+ |:-----------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
197
+ | <code>Hat myTime ein großes Produktsortiment?</code> | <code>Das Sortiment von myTime umfasst mehr als 13.000 Lebensmittel. Du findest alle Produkte, die du auch im Supermarkt findest, darunter Obst und Gemüse, trockene Lebensmittel wie Pasta und Reis, Backwaren, Snacks und Tiefkühlkost. Auch Getränke wie Kaffee, Alkohol und Soda findest du im Online-Supermarkt.</code> |
198
+ | <code>Gibt es eine Tigerspin App?</code> | <code>Tigerspin verzichtet auf eine mobile App. Wenn Sie ein paar Runden spielen möchten, öffnen Sie einfach die Webseite des Casinos und starten die Spiele im Browser.</code> |
199
+ | <code>Bietet ihr auch maschinelle Übersetzungen an? Wenn ja, wann eignet sich diese und wann nicht?</code> | <code>Maschinelle Übersetzungen sind ein span­nen­des Thema, auch aktuell bei techtrans. Unter maschineller Übersetzung (MÜ) ver­steht man die auto­mati­sierte Über­tra­gung eines Aus­gangs­textes in die Ziel­sprache mittels einer so­ge­nannten Über­set­zungs­engine. Eine solche Engine kann nach re­gel­ba­sier­ten, statis­tischen oder neu­ro­nalen Prin­zipien auf­ge­baut sein.<br>Ob­wohl es maschinelle Über­set­zungs­engines schon seit einigen Jahr­zehn­ten gibt, ist erst mit der Ein­führung der neu­ro­nalen Engines (NMT) ca. ab dem Jahre 2015 die Output-Qualität ge­stie­gen. Nam­hafte Engine Provider sind zum Bei­spiel Google, DeepL, Microsoft, Amazon AWS und SDL. So ist es kaum ver­wunder­lich, dass diese Tech­no­logie zu­neh­mend Ein­zug sowohl in unseren All­tag als auch in den Über­set­zungs­pro­zess findet.</code> |
200
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
201
+ ```json
202
+ {
203
+ "scale": 20.0,
204
+ "similarity_fct": "cos_sim"
205
+ }
206
+ ```
207
+
208
+ ### Training Hyperparameters
209
+ #### Non-Default Hyperparameters
210
+
211
+ - `per_device_train_batch_size`: 128
212
+ - `per_device_eval_batch_size`: 128
213
+ - `num_train_epochs`: 1
214
+ - `fp16`: True
215
+ - `multi_dataset_batch_sampler`: round_robin
216
+
217
+ #### All Hyperparameters
218
+ <details><summary>Click to expand</summary>
219
+
220
+ - `overwrite_output_dir`: False
221
+ - `do_predict`: False
222
+ - `eval_strategy`: no
223
+ - `prediction_loss_only`: True
224
+ - `per_device_train_batch_size`: 128
225
+ - `per_device_eval_batch_size`: 128
226
+ - `per_gpu_train_batch_size`: None
227
+ - `per_gpu_eval_batch_size`: None
228
+ - `gradient_accumulation_steps`: 1
229
+ - `eval_accumulation_steps`: None
230
+ - `torch_empty_cache_steps`: None
231
+ - `learning_rate`: 5e-05
232
+ - `weight_decay`: 0.0
233
+ - `adam_beta1`: 0.9
234
+ - `adam_beta2`: 0.999
235
+ - `adam_epsilon`: 1e-08
236
+ - `max_grad_norm`: 1
237
+ - `num_train_epochs`: 1
238
+ - `max_steps`: -1
239
+ - `lr_scheduler_type`: linear
240
+ - `lr_scheduler_kwargs`: {}
241
+ - `warmup_ratio`: 0.0
242
+ - `warmup_steps`: 0
243
+ - `log_level`: passive
244
+ - `log_level_replica`: warning
245
+ - `log_on_each_node`: True
246
+ - `logging_nan_inf_filter`: True
247
+ - `save_safetensors`: True
248
+ - `save_on_each_node`: False
249
+ - `save_only_model`: False
250
+ - `restore_callback_states_from_checkpoint`: False
251
+ - `no_cuda`: False
252
+ - `use_cpu`: False
253
+ - `use_mps_device`: False
254
+ - `seed`: 42
255
+ - `data_seed`: None
256
+ - `jit_mode_eval`: False
257
+ - `use_ipex`: False
258
+ - `bf16`: False
259
+ - `fp16`: True
260
+ - `fp16_opt_level`: O1
261
+ - `half_precision_backend`: auto
262
+ - `bf16_full_eval`: False
263
+ - `fp16_full_eval`: False
264
+ - `tf32`: None
265
+ - `local_rank`: 0
266
+ - `ddp_backend`: None
267
+ - `tpu_num_cores`: None
268
+ - `tpu_metrics_debug`: False
269
+ - `debug`: []
270
+ - `dataloader_drop_last`: False
271
+ - `dataloader_num_workers`: 0
272
+ - `dataloader_prefetch_factor`: None
273
+ - `past_index`: -1
274
+ - `disable_tqdm`: False
275
+ - `remove_unused_columns`: True
276
+ - `label_names`: None
277
+ - `load_best_model_at_end`: False
278
+ - `ignore_data_skip`: False
279
+ - `fsdp`: []
280
+ - `fsdp_min_num_params`: 0
281
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
282
+ - `fsdp_transformer_layer_cls_to_wrap`: None
283
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
284
+ - `deepspeed`: None
285
+ - `label_smoothing_factor`: 0.0
286
+ - `optim`: adamw_torch
287
+ - `optim_args`: None
288
+ - `adafactor`: False
289
+ - `group_by_length`: False
290
+ - `length_column_name`: length
291
+ - `ddp_find_unused_parameters`: None
292
+ - `ddp_bucket_cap_mb`: None
293
+ - `ddp_broadcast_buffers`: False
294
+ - `dataloader_pin_memory`: True
295
+ - `dataloader_persistent_workers`: False
296
+ - `skip_memory_metrics`: True
297
+ - `use_legacy_prediction_loop`: False
298
+ - `push_to_hub`: False
299
+ - `resume_from_checkpoint`: None
300
+ - `hub_model_id`: None
301
+ - `hub_strategy`: every_save
302
+ - `hub_private_repo`: None
303
+ - `hub_always_push`: False
304
+ - `gradient_checkpointing`: False
305
+ - `gradient_checkpointing_kwargs`: None
306
+ - `include_inputs_for_metrics`: False
307
+ - `include_for_metrics`: []
308
+ - `eval_do_concat_batches`: True
309
+ - `fp16_backend`: auto
310
+ - `push_to_hub_model_id`: None
311
+ - `push_to_hub_organization`: None
312
+ - `mp_parameters`:
313
+ - `auto_find_batch_size`: False
314
+ - `full_determinism`: False
315
+ - `torchdynamo`: None
316
+ - `ray_scope`: last
317
+ - `ddp_timeout`: 1800
318
+ - `torch_compile`: False
319
+ - `torch_compile_backend`: None
320
+ - `torch_compile_mode`: None
321
+ - `dispatch_batches`: None
322
+ - `split_batches`: None
323
+ - `include_tokens_per_second`: False
324
+ - `include_num_input_tokens_seen`: False
325
+ - `neftune_noise_alpha`: None
326
+ - `optim_target_modules`: None
327
+ - `batch_eval_metrics`: False
328
+ - `eval_on_start`: False
329
+ - `use_liger_kernel`: False
330
+ - `eval_use_gather_object`: False
331
+ - `average_tokens_across_devices`: False
332
+ - `prompts`: None
333
+ - `batch_sampler`: batch_sampler
334
+ - `multi_dataset_batch_sampler`: round_robin
335
+
336
+ </details>
337
+
338
+ ### Training Logs
339
+ | Epoch | Step | Training Loss |
340
+ |:-----:|:-----:|:-------------:|
341
+ | 0.025 | 500 | 0.1999 |
342
+ | 0.05 | 1000 | 0.0279 |
343
+ | 0.075 | 1500 | 0.0234 |
344
+ | 0.1 | 2000 | 0.0203 |
345
+ | 0.125 | 2500 | 0.0179 |
346
+ | 0.15 | 3000 | 0.0171 |
347
+ | 0.175 | 3500 | 0.0153 |
348
+ | 0.2 | 4000 | 0.015 |
349
+ | 0.225 | 4500 | 0.0143 |
350
+ | 0.25 | 5000 | 0.014 |
351
+ | 0.275 | 5500 | 0.0128 |
352
+ | 0.3 | 6000 | 0.013 |
353
+ | 0.325 | 6500 | 0.0129 |
354
+ | 0.35 | 7000 | 0.0124 |
355
+ | 0.375 | 7500 | 0.012 |
356
+ | 0.4 | 8000 | 0.0121 |
357
+ | 0.425 | 8500 | 0.0115 |
358
+ | 0.45 | 9000 | 0.0113 |
359
+ | 0.475 | 9500 | 0.0106 |
360
+ | 0.5 | 10000 | 0.0107 |
361
+ | 0.525 | 10500 | 0.011 |
362
+ | 0.55 | 11000 | 0.0108 |
363
+ | 0.575 | 11500 | 0.0103 |
364
+ | 0.6 | 12000 | 0.0097 |
365
+ | 0.625 | 12500 | 0.01 |
366
+ | 0.65 | 13000 | 0.0104 |
367
+ | 0.675 | 13500 | 0.0096 |
368
+ | 0.7 | 14000 | 0.0096 |
369
+ | 0.725 | 14500 | 0.0097 |
370
+ | 0.75 | 15000 | 0.0097 |
371
+ | 0.775 | 15500 | 0.0089 |
372
+ | 0.8 | 16000 | 0.0089 |
373
+ | 0.825 | 16500 | 0.0091 |
374
+ | 0.85 | 17000 | 0.0085 |
375
+ | 0.875 | 17500 | 0.0084 |
376
+ | 0.9 | 18000 | 0.0089 |
377
+ | 0.925 | 18500 | 0.0087 |
378
+ | 0.95 | 19000 | 0.0087 |
379
+ | 0.975 | 19500 | 0.0088 |
380
+ | 1.0 | 20000 | 0.0089 |
381
+
382
+
383
+ ### Framework Versions
384
+ - Python: 3.11.5
385
+ - Sentence Transformers: 3.4.0
386
+ - Transformers: 4.48.0
387
+ - PyTorch: 2.5.1+cu124
388
+ - Accelerate: 1.2.1
389
+ - Datasets: 2.21.0
390
+ - Tokenizers: 0.21.0
391
+
392
+ ## Citation
393
+
394
+ ### BibTeX
395
+
396
+ #### Sentence Transformers
397
+ ```bibtex
398
+ @inproceedings{reimers-2019-sentence-bert,
399
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
400
+ author = "Reimers, Nils and Gurevych, Iryna",
401
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
402
+ month = "11",
403
+ year = "2019",
404
+ publisher = "Association for Computational Linguistics",
405
+ url = "https://arxiv.org/abs/1908.10084",
406
+ }
407
+ ```
408
+
409
+ #### MultipleNegativesRankingLoss
410
+ ```bibtex
411
+ @misc{henderson2017efficient,
412
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
413
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
414
+ year={2017},
415
+ eprint={1705.00652},
416
+ archivePrefix={arXiv},
417
+ primaryClass={cs.CL}
418
+ }
419
+ ```
420
+
421
+ <!--
422
+ ## Glossary
423
+
424
+ *Clearly define terms in order to be accessible across audiences.*
425
+ -->
426
+
427
+ <!--
428
+ ## Model Card Authors
429
+
430
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
431
+ -->
432
+
433
+ <!--
434
+ ## Model Card Contact
435
+
436
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
437
+ -->
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./output/train_bi-encoder-margin_mse_en-FacebookAI-xlm-roberta-base-batch_size_64-2025-01-30_18-47-30/checkpoint-235740",
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.48.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.0",
4
+ "transformers": "4.48.0",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1696eeef2daa2f542901467abdab2ce775c8c163c267c48f9e366ff83081018b
3
+ size 1112197096
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:430ad61253c9a42295fc30b618c866f2c34df2cb0f18b5834d17f31efae85de0
3
+ size 2219789306
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46380ef77ef5493da665dcbbc26023bf3752c7f7f93d643a33b49f2ccd56d1ec
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59fbf91dc318e0a56917da48de0354ad45d2d0bfebb702542c08345aabda68fb
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883b037111086fd4dfebbbc9b7cee11e1517b5e0c0514879478661440f137085
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }
trainer_state.json ADDED
@@ -0,0 +1,313 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 0,
6
+ "global_step": 20000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.025,
13
+ "grad_norm": 1.52798593044281,
14
+ "learning_rate": 9.756378189094549e-06,
15
+ "loss": 0.1999,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.05,
20
+ "grad_norm": 2.1572341918945312,
21
+ "learning_rate": 9.506253126563282e-06,
22
+ "loss": 0.0279,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.075,
27
+ "grad_norm": 1.0278618335723877,
28
+ "learning_rate": 9.256128064032017e-06,
29
+ "loss": 0.0234,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.1,
34
+ "grad_norm": 1.9378705024719238,
35
+ "learning_rate": 9.006003001500752e-06,
36
+ "loss": 0.0203,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.125,
41
+ "grad_norm": 1.0253329277038574,
42
+ "learning_rate": 8.755877938969486e-06,
43
+ "loss": 0.0179,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.15,
48
+ "grad_norm": 1.588853359222412,
49
+ "learning_rate": 8.50575287643822e-06,
50
+ "loss": 0.0171,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.175,
55
+ "grad_norm": 2.365065336227417,
56
+ "learning_rate": 8.255627813906954e-06,
57
+ "loss": 0.0153,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 0.2,
62
+ "grad_norm": 0.9309738874435425,
63
+ "learning_rate": 8.005502751375689e-06,
64
+ "loss": 0.015,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 0.225,
69
+ "grad_norm": 0.4729306697845459,
70
+ "learning_rate": 7.755377688844424e-06,
71
+ "loss": 0.0143,
72
+ "step": 4500
73
+ },
74
+ {
75
+ "epoch": 0.25,
76
+ "grad_norm": 1.5990219116210938,
77
+ "learning_rate": 7.505252626313157e-06,
78
+ "loss": 0.014,
79
+ "step": 5000
80
+ },
81
+ {
82
+ "epoch": 0.275,
83
+ "grad_norm": 0.32973700761795044,
84
+ "learning_rate": 7.255627813906953e-06,
85
+ "loss": 0.0128,
86
+ "step": 5500
87
+ },
88
+ {
89
+ "epoch": 0.3,
90
+ "grad_norm": 0.7276681661605835,
91
+ "learning_rate": 7.0055027513756875e-06,
92
+ "loss": 0.013,
93
+ "step": 6000
94
+ },
95
+ {
96
+ "epoch": 0.325,
97
+ "grad_norm": 1.137271761894226,
98
+ "learning_rate": 6.7553776888444225e-06,
99
+ "loss": 0.0129,
100
+ "step": 6500
101
+ },
102
+ {
103
+ "epoch": 0.35,
104
+ "grad_norm": 0.2594708800315857,
105
+ "learning_rate": 6.505252626313157e-06,
106
+ "loss": 0.0124,
107
+ "step": 7000
108
+ },
109
+ {
110
+ "epoch": 0.375,
111
+ "grad_norm": 0.2336331158876419,
112
+ "learning_rate": 6.255127563781892e-06,
113
+ "loss": 0.012,
114
+ "step": 7500
115
+ },
116
+ {
117
+ "epoch": 0.4,
118
+ "grad_norm": 0.19649972021579742,
119
+ "learning_rate": 6.005002501250627e-06,
120
+ "loss": 0.0121,
121
+ "step": 8000
122
+ },
123
+ {
124
+ "epoch": 0.425,
125
+ "grad_norm": 0.5051653981208801,
126
+ "learning_rate": 5.754877438719361e-06,
127
+ "loss": 0.0115,
128
+ "step": 8500
129
+ },
130
+ {
131
+ "epoch": 0.45,
132
+ "grad_norm": 0.3455007076263428,
133
+ "learning_rate": 5.504752376188095e-06,
134
+ "loss": 0.0113,
135
+ "step": 9000
136
+ },
137
+ {
138
+ "epoch": 0.475,
139
+ "grad_norm": 0.4421294331550598,
140
+ "learning_rate": 5.254627313656829e-06,
141
+ "loss": 0.0106,
142
+ "step": 9500
143
+ },
144
+ {
145
+ "epoch": 0.5,
146
+ "grad_norm": 1.2041834592819214,
147
+ "learning_rate": 5.004502251125563e-06,
148
+ "loss": 0.0107,
149
+ "step": 10000
150
+ },
151
+ {
152
+ "epoch": 0.525,
153
+ "grad_norm": 1.119016408920288,
154
+ "learning_rate": 4.754377188594297e-06,
155
+ "loss": 0.011,
156
+ "step": 10500
157
+ },
158
+ {
159
+ "epoch": 0.55,
160
+ "grad_norm": 0.9011972546577454,
161
+ "learning_rate": 4.5042521260630315e-06,
162
+ "loss": 0.0108,
163
+ "step": 11000
164
+ },
165
+ {
166
+ "epoch": 0.575,
167
+ "grad_norm": 0.4205041229724884,
168
+ "learning_rate": 4.2546273136568285e-06,
169
+ "loss": 0.0103,
170
+ "step": 11500
171
+ },
172
+ {
173
+ "epoch": 0.6,
174
+ "grad_norm": 0.22211983799934387,
175
+ "learning_rate": 4.0045022511255635e-06,
176
+ "loss": 0.0097,
177
+ "step": 12000
178
+ },
179
+ {
180
+ "epoch": 0.625,
181
+ "grad_norm": 0.26337993144989014,
182
+ "learning_rate": 3.7543771885942976e-06,
183
+ "loss": 0.01,
184
+ "step": 12500
185
+ },
186
+ {
187
+ "epoch": 0.65,
188
+ "grad_norm": 0.526462197303772,
189
+ "learning_rate": 3.5042521260630318e-06,
190
+ "loss": 0.0104,
191
+ "step": 13000
192
+ },
193
+ {
194
+ "epoch": 0.675,
195
+ "grad_norm": 0.12982240319252014,
196
+ "learning_rate": 3.2541270635317664e-06,
197
+ "loss": 0.0096,
198
+ "step": 13500
199
+ },
200
+ {
201
+ "epoch": 0.7,
202
+ "grad_norm": 0.8299207091331482,
203
+ "learning_rate": 3.0040020010005005e-06,
204
+ "loss": 0.0096,
205
+ "step": 14000
206
+ },
207
+ {
208
+ "epoch": 0.725,
209
+ "grad_norm": 0.18750137090682983,
210
+ "learning_rate": 2.753876938469235e-06,
211
+ "loss": 0.0097,
212
+ "step": 14500
213
+ },
214
+ {
215
+ "epoch": 0.75,
216
+ "grad_norm": 0.37416380643844604,
217
+ "learning_rate": 2.5037518759379692e-06,
218
+ "loss": 0.0097,
219
+ "step": 15000
220
+ },
221
+ {
222
+ "epoch": 0.775,
223
+ "grad_norm": 0.47138354182243347,
224
+ "learning_rate": 2.2536268134067034e-06,
225
+ "loss": 0.0089,
226
+ "step": 15500
227
+ },
228
+ {
229
+ "epoch": 0.8,
230
+ "grad_norm": 0.859859824180603,
231
+ "learning_rate": 2.003501750875438e-06,
232
+ "loss": 0.0089,
233
+ "step": 16000
234
+ },
235
+ {
236
+ "epoch": 0.825,
237
+ "grad_norm": Infinity,
238
+ "learning_rate": 1.7538769384692347e-06,
239
+ "loss": 0.0091,
240
+ "step": 16500
241
+ },
242
+ {
243
+ "epoch": 0.85,
244
+ "grad_norm": 0.4002706706523895,
245
+ "learning_rate": 1.503751875937969e-06,
246
+ "loss": 0.0085,
247
+ "step": 17000
248
+ },
249
+ {
250
+ "epoch": 0.875,
251
+ "grad_norm": 0.29968053102493286,
252
+ "learning_rate": 1.2536268134067034e-06,
253
+ "loss": 0.0084,
254
+ "step": 17500
255
+ },
256
+ {
257
+ "epoch": 0.9,
258
+ "grad_norm": 1.422002911567688,
259
+ "learning_rate": 1.0035017508754378e-06,
260
+ "loss": 0.0089,
261
+ "step": 18000
262
+ },
263
+ {
264
+ "epoch": 0.925,
265
+ "grad_norm": 0.272173672914505,
266
+ "learning_rate": 7.533766883441721e-07,
267
+ "loss": 0.0087,
268
+ "step": 18500
269
+ },
270
+ {
271
+ "epoch": 0.95,
272
+ "grad_norm": 0.2766880691051483,
273
+ "learning_rate": 5.03751875937969e-07,
274
+ "loss": 0.0087,
275
+ "step": 19000
276
+ },
277
+ {
278
+ "epoch": 0.975,
279
+ "grad_norm": 0.17756783962249756,
280
+ "learning_rate": 2.5362681340670335e-07,
281
+ "loss": 0.0088,
282
+ "step": 19500
283
+ },
284
+ {
285
+ "epoch": 1.0,
286
+ "grad_norm": 0.5026212334632874,
287
+ "learning_rate": 3.501750875437719e-09,
288
+ "loss": 0.0089,
289
+ "step": 20000
290
+ }
291
+ ],
292
+ "logging_steps": 500,
293
+ "max_steps": 20000,
294
+ "num_input_tokens_seen": 0,
295
+ "num_train_epochs": 1,
296
+ "save_steps": 5000,
297
+ "stateful_callbacks": {
298
+ "TrainerControl": {
299
+ "args": {
300
+ "should_epoch_stop": false,
301
+ "should_evaluate": false,
302
+ "should_log": false,
303
+ "should_save": true,
304
+ "should_training_stop": true
305
+ },
306
+ "attributes": {}
307
+ }
308
+ },
309
+ "total_flos": 0.0,
310
+ "train_batch_size": 128,
311
+ "trial_name": null,
312
+ "trial_params": null
313
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f77d3a347201417be4c949c28d4ff7787867631b281f333a7d3780a833e85f3
3
+ size 5624