chinchilla04 commited on
Commit
f36f848
·
verified ·
1 Parent(s): a41bcde

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,530 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:15002
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: BAAI/bge-large-en-v1.5
10
+ widget:
11
+ - source_sentence: what kind of oil and how much do i need for my toyota tacoma truck
12
+ and how do i do it
13
+ sentences:
14
+ - Requests to change the system or application's language settings. Users may ask
15
+ to switch to a specific language, such as English, or adjust the language preferences
16
+ to enhance usability.
17
+ - Requests for step-by-step instructions or guidance on how to change the oil in
18
+ a car. Users seek detailed procedures, tools needed, and tips for performing this
19
+ maintenance task.
20
+ - Requests to make a reservation at a specific restaurant for a specified number
21
+ of people, time, and under a provided name. Users expect confirmation of the booking
22
+ details.
23
+ - source_sentence: please double check my reservations for six at mani
24
+ sentences:
25
+ - Requests to verify or confirm existing reservations, typically for dining or events.
26
+ Users provide details about the reservation and ask for confirmation that it is
27
+ correctly recorded.
28
+ - Requests for details about an insurance policy, including coverage, benefits,
29
+ and exclusions. Users may inquire about specific aspects like health benefits
30
+ or policy terms.
31
+ - Requests to create, manage, or customize timers for various tasks or activities.
32
+ Users can define the duration, purpose, or type of the timer and receive notifications
33
+ or alerts when the timer reaches its set time.
34
+ - source_sentence: what are some good ethiopian restaurants in queens
35
+ sentences:
36
+ - Requests for the meaning or definition of words. Users may inquire about the definitions
37
+ of uncommon, complex, or unfamiliar terms, aiming to gain a clear understanding
38
+ or contextual usage of the word in question.
39
+ - Requests to assist with paying bills, such as utilities, credit cards, or other
40
+ services. Users may specify the bill type, amount, and source account for the
41
+ payment.
42
+ - Requests for recommendations or suggestions for dining options. Users may ask
43
+ for specific cuisine types, locations, or general ideas on where to eat.
44
+ - source_sentence: are there any expected delays for flight dl123
45
+ sentences:
46
+ - Requests for travel time or distance to a specific location. Users typically seek
47
+ estimates based on current traffic, routes, or modes of transportation to determine
48
+ the time needed to reach their destination.
49
+ - Requests for information about flight details, such as boarding times, delays,
50
+ or schedules. Users typically inquire to ensure they are updated about their flight's
51
+ status.
52
+ - Requests for advice or strategies to improve credit scores. Users may seek a detailed
53
+ plan, tips, or insights into financial habits that can lead to a better credit
54
+ rating.
55
+ - source_sentence: how do i ask about the weather in chinese
56
+ sentences:
57
+ - Requests related to translating words, phrases, or sentences from one language
58
+ to another. The user may specify the source and target languages, and the goal
59
+ is to provide an accurate and context-appropriate translation.
60
+ - Requests for information about a vehicle's miles per gallon (MPG) rating, either
61
+ in specific conditions like city driving or as an overall performance metric.
62
+ Users may seek guidance on fuel efficiency for their car.
63
+ - Requests for information about a vehicle's miles per gallon (MPG) rating, either
64
+ in specific conditions like city driving or as an overall performance metric.
65
+ Users may seek guidance on fuel efficiency for their car.
66
+ pipeline_tag: sentence-similarity
67
+ library_name: sentence-transformers
68
+ metrics:
69
+ - cosine_accuracy@1
70
+ - cosine_accuracy@3
71
+ - cosine_accuracy@5
72
+ - cosine_accuracy@10
73
+ - cosine_precision@1
74
+ - cosine_precision@3
75
+ - cosine_precision@5
76
+ - cosine_precision@10
77
+ - cosine_recall@1
78
+ - cosine_recall@3
79
+ - cosine_recall@5
80
+ - cosine_recall@10
81
+ - cosine_ndcg@10
82
+ - cosine_mrr@10
83
+ - cosine_map@100
84
+ model-index:
85
+ - name: SentenceTransformer based on BAAI/bge-large-en-v1.5
86
+ results:
87
+ - task:
88
+ type: information-retrieval
89
+ name: Information Retrieval
90
+ dataset:
91
+ name: Unknown
92
+ type: unknown
93
+ metrics:
94
+ - type: cosine_accuracy@1
95
+ value: 0.9706666666666667
96
+ name: Cosine Accuracy@1
97
+ - type: cosine_accuracy@3
98
+ value: 0.9886666666666667
99
+ name: Cosine Accuracy@3
100
+ - type: cosine_accuracy@5
101
+ value: 0.992
102
+ name: Cosine Accuracy@5
103
+ - type: cosine_accuracy@10
104
+ value: 0.9956666666666667
105
+ name: Cosine Accuracy@10
106
+ - type: cosine_precision@1
107
+ value: 0.9706666666666667
108
+ name: Cosine Precision@1
109
+ - type: cosine_precision@3
110
+ value: 0.3295555555555556
111
+ name: Cosine Precision@3
112
+ - type: cosine_precision@5
113
+ value: 0.19840000000000002
114
+ name: Cosine Precision@5
115
+ - type: cosine_precision@10
116
+ value: 0.09956666666666668
117
+ name: Cosine Precision@10
118
+ - type: cosine_recall@1
119
+ value: 0.9706666666666667
120
+ name: Cosine Recall@1
121
+ - type: cosine_recall@3
122
+ value: 0.9886666666666667
123
+ name: Cosine Recall@3
124
+ - type: cosine_recall@5
125
+ value: 0.992
126
+ name: Cosine Recall@5
127
+ - type: cosine_recall@10
128
+ value: 0.9956666666666667
129
+ name: Cosine Recall@10
130
+ - type: cosine_ndcg@10
131
+ value: 0.9841961906084298
132
+ name: Cosine Ndcg@10
133
+ - type: cosine_mrr@10
134
+ value: 0.9804173280423282
135
+ name: Cosine Mrr@10
136
+ - type: cosine_map@100
137
+ value: 0.9806052445247627
138
+ name: Cosine Map@100
139
+ ---
140
+
141
+ # SentenceTransformer based on BAAI/bge-large-en-v1.5
142
+
143
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
144
+
145
+ ## Model Details
146
+
147
+ ### Model Description
148
+ - **Model Type:** Sentence Transformer
149
+ - **Base model:** [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) <!-- at revision d4aa6901d3a41ba39fb536a557fa166f842b0e09 -->
150
+ - **Maximum Sequence Length:** 512 tokens
151
+ - **Output Dimensionality:** 1024 dimensions
152
+ - **Similarity Function:** Cosine Similarity
153
+ <!-- - **Training Dataset:** Unknown -->
154
+ <!-- - **Language:** Unknown -->
155
+ <!-- - **License:** Unknown -->
156
+
157
+ ### Model Sources
158
+
159
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
160
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
161
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
162
+
163
+ ### Full Model Architecture
164
+
165
+ ```
166
+ SentenceTransformer(
167
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
168
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
169
+ (2): Normalize()
170
+ )
171
+ ```
172
+
173
+ ## Usage
174
+
175
+ ### Direct Usage (Sentence Transformers)
176
+
177
+ First install the Sentence Transformers library:
178
+
179
+ ```bash
180
+ pip install -U sentence-transformers
181
+ ```
182
+
183
+ Then you can load this model and run inference.
184
+ ```python
185
+ from sentence_transformers import SentenceTransformer
186
+
187
+ # Download from the 🤗 Hub
188
+ model = SentenceTransformer("chinchilla04/bge-finetuned-train")
189
+ # Run inference
190
+ sentences = [
191
+ 'how do i ask about the weather in chinese',
192
+ 'Requests related to translating words, phrases, or sentences from one language to another. The user may specify the source and target languages, and the goal is to provide an accurate and context-appropriate translation.',
193
+ "Requests for information about a vehicle's miles per gallon (MPG) rating, either in specific conditions like city driving or as an overall performance metric. Users may seek guidance on fuel efficiency for their car.",
194
+ ]
195
+ embeddings = model.encode(sentences)
196
+ print(embeddings.shape)
197
+ # [3, 1024]
198
+
199
+ # Get the similarity scores for the embeddings
200
+ similarities = model.similarity(embeddings, embeddings)
201
+ print(similarities.shape)
202
+ # [3, 3]
203
+ ```
204
+
205
+ <!--
206
+ ### Direct Usage (Transformers)
207
+
208
+ <details><summary>Click to see the direct usage in Transformers</summary>
209
+
210
+ </details>
211
+ -->
212
+
213
+ <!--
214
+ ### Downstream Usage (Sentence Transformers)
215
+
216
+ You can finetune this model on your own dataset.
217
+
218
+ <details><summary>Click to expand</summary>
219
+
220
+ </details>
221
+ -->
222
+
223
+ <!--
224
+ ### Out-of-Scope Use
225
+
226
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
227
+ -->
228
+
229
+ ## Evaluation
230
+
231
+ ### Metrics
232
+
233
+ #### Information Retrieval
234
+
235
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
236
+
237
+ | Metric | Value |
238
+ |:--------------------|:-----------|
239
+ | cosine_accuracy@1 | 0.9707 |
240
+ | cosine_accuracy@3 | 0.9887 |
241
+ | cosine_accuracy@5 | 0.992 |
242
+ | cosine_accuracy@10 | 0.9957 |
243
+ | cosine_precision@1 | 0.9707 |
244
+ | cosine_precision@3 | 0.3296 |
245
+ | cosine_precision@5 | 0.1984 |
246
+ | cosine_precision@10 | 0.0996 |
247
+ | cosine_recall@1 | 0.9707 |
248
+ | cosine_recall@3 | 0.9887 |
249
+ | cosine_recall@5 | 0.992 |
250
+ | cosine_recall@10 | 0.9957 |
251
+ | **cosine_ndcg@10** | **0.9842** |
252
+ | cosine_mrr@10 | 0.9804 |
253
+ | cosine_map@100 | 0.9806 |
254
+
255
+ <!--
256
+ ## Bias, Risks and Limitations
257
+
258
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
259
+ -->
260
+
261
+ <!--
262
+ ### Recommendations
263
+
264
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
265
+ -->
266
+
267
+ ## Training Details
268
+
269
+ ### Training Dataset
270
+
271
+ #### Unnamed Dataset
272
+
273
+
274
+ * Size: 15,002 training samples
275
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
276
+ * Approximate statistics based on the first 1000 samples:
277
+ | | anchor | positive | negative |
278
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
279
+ | type | string | string | string |
280
+ | details | <ul><li>min: 4 tokens</li><li>mean: 10.66 tokens</li><li>max: 28 tokens</li></ul> | <ul><li>min: 25 tokens</li><li>mean: 42.6 tokens</li><li>max: 58 tokens</li></ul> | <ul><li>min: 29 tokens</li><li>mean: 41.95 tokens</li><li>max: 58 tokens</li></ul> |
281
+ * Samples:
282
+ | anchor | positive | negative |
283
+ |:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
284
+ | <code>what expression would i use to say i love you if i were an italian</code> | <code>Requests related to translating words, phrases, or sentences from one language to another. The user may specify the source and target languages, and the goal is to provide an accurate and context-appropriate translation.</code> | <code>Requests involving financial operations, such as transferring money between bank accounts, credit cards, or other financial instruments. Users typically specify the amount, the source account, and the target account, ensuring that the transfer is executed correctly and securely.</code> |
285
+ | <code>can you tell me how to say 'i do not speak much spanish', in spanish</code> | <code>Requests related to translating words, phrases, or sentences from one language to another. The user may specify the source and target languages, and the goal is to provide an accurate and context-appropriate translation.</code> | <code>Requests involving financial operations, such as transferring money between bank accounts, credit cards, or other financial instruments. Users typically specify the amount, the source account, and the target account, ensuring that the transfer is executed correctly and securely.</code> |
286
+ | <code>what is the equivalent of, 'life is good' in french</code> | <code>Requests related to translating words, phrases, or sentences from one language to another. The user may specify the source and target languages, and the goal is to provide an accurate and context-appropriate translation.</code> | <code>Requests involving financial operations, such as transferring money between bank accounts, credit cards, or other financial instruments. Users typically specify the amount, the source account, and the target account, ensuring that the transfer is executed correctly and securely.</code> |
287
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
288
+ ```json
289
+ {
290
+ "scale": 20.0,
291
+ "similarity_fct": "cos_sim"
292
+ }
293
+ ```
294
+
295
+ ### Evaluation Dataset
296
+
297
+ #### Unnamed Dataset
298
+
299
+
300
+ * Size: 3,000 evaluation samples
301
+ * Columns: <code>anchor</code> and <code>positive</code>
302
+ * Approximate statistics based on the first 1000 samples:
303
+ | | anchor | positive |
304
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
305
+ | type | string | string |
306
+ | details | <ul><li>min: 3 tokens</li><li>mean: 11.06 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 26 tokens</li><li>mean: 36.16 tokens</li><li>max: 58 tokens</li></ul> |
307
+ * Samples:
308
+ | anchor | positive |
309
+ |:------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
310
+ | <code>in spanish, meet me tomorrow is said how</code> | <code>Requests related to translating words, phrases, or sentences from one language to another. The user may specify the source and target languages, and the goal is to provide an accurate and context-appropriate translation.</code> |
311
+ | <code>in french, how do i say, see you later</code> | <code>Requests related to translating words, phrases, or sentences from one language to another. The user may specify the source and target languages, and the goal is to provide an accurate and context-appropriate translation.</code> |
312
+ | <code>how do you say hello in japanese</code> | <code>Requests related to translating words, phrases, or sentences from one language to another. The user may specify the source and target languages, and the goal is to provide an accurate and context-appropriate translation.</code> |
313
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
314
+ ```json
315
+ {
316
+ "scale": 20.0,
317
+ "similarity_fct": "cos_sim"
318
+ }
319
+ ```
320
+
321
+ ### Training Hyperparameters
322
+ #### Non-Default Hyperparameters
323
+
324
+ - `eval_strategy`: steps
325
+ - `per_device_train_batch_size`: 32
326
+ - `learning_rate`: 1e-05
327
+ - `num_train_epochs`: 4
328
+ - `lr_scheduler_type`: cosine
329
+ - `warmup_ratio`: 0.2
330
+ - `load_best_model_at_end`: True
331
+ - `optim`: adamw_torch_fused
332
+
333
+ #### All Hyperparameters
334
+ <details><summary>Click to expand</summary>
335
+
336
+ - `overwrite_output_dir`: False
337
+ - `do_predict`: False
338
+ - `eval_strategy`: steps
339
+ - `prediction_loss_only`: True
340
+ - `per_device_train_batch_size`: 32
341
+ - `per_device_eval_batch_size`: 8
342
+ - `per_gpu_train_batch_size`: None
343
+ - `per_gpu_eval_batch_size`: None
344
+ - `gradient_accumulation_steps`: 1
345
+ - `eval_accumulation_steps`: None
346
+ - `torch_empty_cache_steps`: None
347
+ - `learning_rate`: 1e-05
348
+ - `weight_decay`: 0.0
349
+ - `adam_beta1`: 0.9
350
+ - `adam_beta2`: 0.999
351
+ - `adam_epsilon`: 1e-08
352
+ - `max_grad_norm`: 1.0
353
+ - `num_train_epochs`: 4
354
+ - `max_steps`: -1
355
+ - `lr_scheduler_type`: cosine
356
+ - `lr_scheduler_kwargs`: {}
357
+ - `warmup_ratio`: 0.2
358
+ - `warmup_steps`: 0
359
+ - `log_level`: passive
360
+ - `log_level_replica`: warning
361
+ - `log_on_each_node`: True
362
+ - `logging_nan_inf_filter`: True
363
+ - `save_safetensors`: True
364
+ - `save_on_each_node`: False
365
+ - `save_only_model`: False
366
+ - `restore_callback_states_from_checkpoint`: False
367
+ - `no_cuda`: False
368
+ - `use_cpu`: False
369
+ - `use_mps_device`: False
370
+ - `seed`: 42
371
+ - `data_seed`: None
372
+ - `jit_mode_eval`: False
373
+ - `use_ipex`: False
374
+ - `bf16`: False
375
+ - `fp16`: False
376
+ - `fp16_opt_level`: O1
377
+ - `half_precision_backend`: auto
378
+ - `bf16_full_eval`: False
379
+ - `fp16_full_eval`: False
380
+ - `tf32`: None
381
+ - `local_rank`: 0
382
+ - `ddp_backend`: None
383
+ - `tpu_num_cores`: None
384
+ - `tpu_metrics_debug`: False
385
+ - `debug`: []
386
+ - `dataloader_drop_last`: False
387
+ - `dataloader_num_workers`: 0
388
+ - `dataloader_prefetch_factor`: None
389
+ - `past_index`: -1
390
+ - `disable_tqdm`: False
391
+ - `remove_unused_columns`: True
392
+ - `label_names`: None
393
+ - `load_best_model_at_end`: True
394
+ - `ignore_data_skip`: False
395
+ - `fsdp`: []
396
+ - `fsdp_min_num_params`: 0
397
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
398
+ - `fsdp_transformer_layer_cls_to_wrap`: None
399
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
400
+ - `deepspeed`: None
401
+ - `label_smoothing_factor`: 0.0
402
+ - `optim`: adamw_torch_fused
403
+ - `optim_args`: None
404
+ - `adafactor`: False
405
+ - `group_by_length`: False
406
+ - `length_column_name`: length
407
+ - `ddp_find_unused_parameters`: None
408
+ - `ddp_bucket_cap_mb`: None
409
+ - `ddp_broadcast_buffers`: False
410
+ - `dataloader_pin_memory`: True
411
+ - `dataloader_persistent_workers`: False
412
+ - `skip_memory_metrics`: True
413
+ - `use_legacy_prediction_loop`: False
414
+ - `push_to_hub`: False
415
+ - `resume_from_checkpoint`: None
416
+ - `hub_model_id`: None
417
+ - `hub_strategy`: every_save
418
+ - `hub_private_repo`: False
419
+ - `hub_always_push`: False
420
+ - `gradient_checkpointing`: False
421
+ - `gradient_checkpointing_kwargs`: None
422
+ - `include_inputs_for_metrics`: False
423
+ - `include_for_metrics`: []
424
+ - `eval_do_concat_batches`: True
425
+ - `fp16_backend`: auto
426
+ - `push_to_hub_model_id`: None
427
+ - `push_to_hub_organization`: None
428
+ - `mp_parameters`:
429
+ - `auto_find_batch_size`: False
430
+ - `full_determinism`: False
431
+ - `torchdynamo`: None
432
+ - `ray_scope`: last
433
+ - `ddp_timeout`: 1800
434
+ - `torch_compile`: False
435
+ - `torch_compile_backend`: None
436
+ - `torch_compile_mode`: None
437
+ - `dispatch_batches`: None
438
+ - `split_batches`: None
439
+ - `include_tokens_per_second`: False
440
+ - `include_num_input_tokens_seen`: False
441
+ - `neftune_noise_alpha`: None
442
+ - `optim_target_modules`: None
443
+ - `batch_eval_metrics`: False
444
+ - `eval_on_start`: False
445
+ - `use_liger_kernel`: False
446
+ - `eval_use_gather_object`: False
447
+ - `average_tokens_across_devices`: False
448
+ - `prompts`: None
449
+ - `batch_sampler`: batch_sampler
450
+ - `multi_dataset_batch_sampler`: proportional
451
+
452
+ </details>
453
+
454
+ ### Training Logs
455
+ | Epoch | Step | Training Loss | Validation Loss | cosine_ndcg@10 |
456
+ |:----------:|:--------:|:-------------:|:---------------:|:--------------:|
457
+ | None | 0 | - | 0.2730 | 0.9055 |
458
+ | 0.3198 | 150 | - | 0.0698 | 0.9633 |
459
+ | 0.6397 | 300 | - | 0.0642 | 0.9683 |
460
+ | 0.9595 | 450 | - | 0.0603 | 0.9763 |
461
+ | 1.0661 | 500 | 1.0338 | - | - |
462
+ | 1.2793 | 600 | - | 0.0612 | 0.9762 |
463
+ | 1.5991 | 750 | - | 0.0602 | 0.9802 |
464
+ | 1.9190 | 900 | - | 0.0571 | 0.9820 |
465
+ | 2.1322 | 1000 | 0.787 | - | - |
466
+ | 2.2388 | 1050 | - | 0.0585 | 0.9819 |
467
+ | **2.5586** | **1200** | **-** | **0.0565** | **0.9842** |
468
+ | 2.8785 | 1350 | - | 0.0578 | 0.9837 |
469
+ | 3.1983 | 1500 | 0.6768 | 0.0570 | 0.9844 |
470
+ | 3.5181 | 1650 | - | 0.0587 | 0.9837 |
471
+ | 3.8380 | 1800 | - | 0.0584 | 0.9837 |
472
+ | None | 0 | - | 0.0565 | 0.9842 |
473
+
474
+ * The bold row denotes the saved checkpoint.
475
+
476
+ ### Framework Versions
477
+ - Python: 3.10.14
478
+ - Sentence Transformers: 3.3.1
479
+ - Transformers: 4.46.3
480
+ - PyTorch: 2.4.0
481
+ - Accelerate: 1.1.1
482
+ - Datasets: 3.1.0
483
+ - Tokenizers: 0.20.3
484
+
485
+ ## Citation
486
+
487
+ ### BibTeX
488
+
489
+ #### Sentence Transformers
490
+ ```bibtex
491
+ @inproceedings{reimers-2019-sentence-bert,
492
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
493
+ author = "Reimers, Nils and Gurevych, Iryna",
494
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
495
+ month = "11",
496
+ year = "2019",
497
+ publisher = "Association for Computational Linguistics",
498
+ url = "https://arxiv.org/abs/1908.10084",
499
+ }
500
+ ```
501
+
502
+ #### MultipleNegativesRankingLoss
503
+ ```bibtex
504
+ @misc{henderson2017efficient,
505
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
506
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
507
+ year={2017},
508
+ eprint={1705.00652},
509
+ archivePrefix={arXiv},
510
+ primaryClass={cs.CL}
511
+ }
512
+ ```
513
+
514
+ <!--
515
+ ## Glossary
516
+
517
+ *Clearly define terms in order to be accessible across audiences.*
518
+ -->
519
+
520
+ <!--
521
+ ## Model Card Authors
522
+
523
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
524
+ -->
525
+
526
+ <!--
527
+ ## Model Card Contact
528
+
529
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
530
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-large-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 4096,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 16,
24
+ "num_hidden_layers": 24,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.46.3",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.46.3",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65335cbf8cf15484594b12feb86f8f2ac3ab1078c341c8d9e161c84627ac138a
3
+ size 1340612432
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff