marroyo777 commited on
Commit
4ca0104
·
verified ·
1 Parent(s): 5cd31b8

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,571 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-small-en-v1.5
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - cosine_accuracy
6
+ - dot_accuracy
7
+ - manhattan_accuracy
8
+ - euclidean_accuracy
9
+ - max_accuracy
10
+ pipeline_tag: sentence-similarity
11
+ tags:
12
+ - sentence-transformers
13
+ - sentence-similarity
14
+ - feature-extraction
15
+ - generated_from_trainer
16
+ - dataset_size:60341
17
+ - loss:MultipleNegativesRankingLoss
18
+ widget:
19
+ - source_sentence: What is the focus of the research conducted by the MHCI x 99P Labs
20
+ Capstone Team?
21
+ sentences:
22
+ - To determine the destination of a given car based on an initial start position
23
+ in time, we developed a Markov Model. We then creatively combined DBScan, K-NN,
24
+ and XGboost algorithms to achieve accurate dwell time forecasts.
25
+ - Transportation networks touch all three pillars of sustainability. They shape
26
+ our daily lives by connecting us to work, retail, and recreation; however, a system
27
+ that does not connect everyone equitably reproduces social disparities.
28
+ - 'Two weeks of digging deep into exploratory, generative research
29
+
30
+ Written by the MHCI x 99P Labs Capstone TeamEdited by 99P Labs
31
+
32
+ The MHCI x 99P Labs Capstone Team is part of the Master of Human-Computer Interaction
33
+ (MHCI) program at Carnegie Mellon University.'
34
+ - source_sentence: What limits are being considered for data quality checks?
35
+ sentences:
36
+ - Unlike many other Agile teams, we don t do a Retro every sprint, mostly because
37
+ we do one-week sprints.
38
+ - Our team has been exploring implementing data quality checks into our data platform.
39
+ We ve been trying to establish our goals, limits, and expectations, some of which
40
+ were discussed in Part 1 of our Data Quality blog posts.
41
+ - Literature and Topical ReviewEach team member performed a literature review on
42
+ telematics research, identifying its applications, methodologies, and critical
43
+ insights.
44
+ - source_sentence: What are the potential consequences of not researching before coding?
45
+ sentences:
46
+ - This indicates a degree of variance in the model s accuracy across different times
47
+ and conditions.
48
+ - In order to objectively test ourselves on the knowledge we ve gained, we decide
49
+ to take a quiz. The quiz contains 50 images of either dogs or cats and we have
50
+ to determine which animal the image most closely resembles.
51
+ - To reiterate, before even writing any code, it s important to do proper research
52
+ into your team s documentation and online resources. A lot of time can be saved
53
+ by reusing code that can adapt to your use case instead of starting from scratch
54
+ every time.
55
+ - source_sentence: What might be the implications of having a performance of 3%?
56
+ sentences:
57
+ - Then, I will highlight the top three winning projects from each track.
58
+ - Channels can be used only by organizations that are invited to the channel and
59
+ are invisible to other members of the network. Each channel has a separate blockchain
60
+ ledger.
61
+ - 3%, only slightly better than the worst-performing model, K Nearest Neighbors.
62
+ - source_sentence: In what context is traffic flow theory typically discussed?
63
+ sentences:
64
+ - As a result, I was familiar with many terms discussed conceptually but I discovered
65
+ some of the more official terminology used when discussing traffic flow theory
66
+ and network control.
67
+ - We called it plus-deltas (+/ ). Seeing the output and outcomes we accomplished
68
+ in each session was encouraging and allowed us to acknowledge things we did that
69
+ made us successful so we could carry it on to the next session.
70
+ - There are different types of projects within C.
71
+ model-index:
72
+ - name: SentenceTransformer based on BAAI/bge-small-en-v1.5
73
+ results:
74
+ - task:
75
+ type: triplet
76
+ name: Triplet
77
+ dataset:
78
+ name: 99GPT Finetuning Embedding test 01
79
+ type: 99GPT-Finetuning-Embedding-test-01
80
+ metrics:
81
+ - type: cosine_accuracy
82
+ value: 0.9987405541561712
83
+ name: Cosine Accuracy
84
+ - type: dot_accuracy
85
+ value: 0.0011931592204693093
86
+ name: Dot Accuracy
87
+ - type: manhattan_accuracy
88
+ value: 0.9987405541561712
89
+ name: Manhattan Accuracy
90
+ - type: euclidean_accuracy
91
+ value: 0.9987405541561712
92
+ name: Euclidean Accuracy
93
+ - type: max_accuracy
94
+ value: 0.9987405541561712
95
+ name: Max Accuracy
96
+ ---
97
+
98
+ # SentenceTransformer based on BAAI/bge-small-en-v1.5
99
+
100
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
101
+
102
+ ## Model Details
103
+
104
+ ### Model Description
105
+ - **Model Type:** Sentence Transformer
106
+ - **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <!-- at revision 5c38ec7c405ec4b44b94cc5a9bb96e735b38267a -->
107
+ - **Maximum Sequence Length:** 512 tokens
108
+ - **Output Dimensionality:** 384 tokens
109
+ - **Similarity Function:** Cosine Similarity
110
+ <!-- - **Training Dataset:** Unknown -->
111
+ <!-- - **Language:** Unknown -->
112
+ <!-- - **License:** Unknown -->
113
+
114
+ ### Model Sources
115
+
116
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
117
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
118
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
119
+
120
+ ### Full Model Architecture
121
+
122
+ ```
123
+ SentenceTransformer(
124
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
125
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
126
+ (2): Normalize()
127
+ )
128
+ ```
129
+
130
+ ## Usage
131
+
132
+ ### Direct Usage (Sentence Transformers)
133
+
134
+ First install the Sentence Transformers library:
135
+
136
+ ```bash
137
+ pip install -U sentence-transformers
138
+ ```
139
+
140
+ Then you can load this model and run inference.
141
+ ```python
142
+ from sentence_transformers import SentenceTransformer
143
+
144
+ # Download from the 🤗 Hub
145
+ model = SentenceTransformer("marroyo777/bge-99GPT-v1")
146
+ # Run inference
147
+ sentences = [
148
+ 'In what context is traffic flow theory typically discussed?',
149
+ 'As a result, I was familiar with many terms discussed conceptually but I discovered some of the more official terminology used when discussing traffic flow theory and network control.',
150
+ 'There are different types of projects within C.',
151
+ ]
152
+ embeddings = model.encode(sentences)
153
+ print(embeddings.shape)
154
+ # [3, 384]
155
+
156
+ # Get the similarity scores for the embeddings
157
+ similarities = model.similarity(embeddings, embeddings)
158
+ print(similarities.shape)
159
+ # [3, 3]
160
+ ```
161
+
162
+ <!--
163
+ ### Direct Usage (Transformers)
164
+
165
+ <details><summary>Click to see the direct usage in Transformers</summary>
166
+
167
+ </details>
168
+ -->
169
+
170
+ <!--
171
+ ### Downstream Usage (Sentence Transformers)
172
+
173
+ You can finetune this model on your own dataset.
174
+
175
+ <details><summary>Click to expand</summary>
176
+
177
+ </details>
178
+ -->
179
+
180
+ <!--
181
+ ### Out-of-Scope Use
182
+
183
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
184
+ -->
185
+
186
+ ## Evaluation
187
+
188
+ ### Metrics
189
+
190
+ #### Triplet
191
+ * Dataset: `99GPT-Finetuning-Embedding-test-01`
192
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
193
+
194
+ | Metric | Value |
195
+ |:-------------------|:-----------|
196
+ | cosine_accuracy | 0.9987 |
197
+ | dot_accuracy | 0.0012 |
198
+ | manhattan_accuracy | 0.9987 |
199
+ | euclidean_accuracy | 0.9987 |
200
+ | **max_accuracy** | **0.9987** |
201
+
202
+ <!--
203
+ ## Bias, Risks and Limitations
204
+
205
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
206
+ -->
207
+
208
+ <!--
209
+ ### Recommendations
210
+
211
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
212
+ -->
213
+
214
+ ## Training Details
215
+
216
+ ### Training Dataset
217
+
218
+ #### Unnamed Dataset
219
+
220
+
221
+ * Size: 60,341 training samples
222
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
223
+ * Approximate statistics based on the first 1000 samples:
224
+ | | anchor | positive | negative |
225
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
226
+ | type | string | string | string |
227
+ | details | <ul><li>min: 7 tokens</li><li>mean: 13.77 tokens</li><li>max: 24 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 40.26 tokens</li><li>max: 123 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 39.24 tokens</li><li>max: 139 tokens</li></ul> |
228
+ * Samples:
229
+ | anchor | positive | negative |
230
+ |:-------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
231
+ | <code>Who is being invited to join the initiative?</code> | <code>Our belief is that the research community will be able to gain access to diverse and real-time data with minimal friction, build exciting innovations and make an impact to Data and AI technologies as well. This is just the first release and we are inviting the research community to join us to build exciting data-driven mobility & energy solutions together.</code> | <code>Burning it destroys the oil. Once you burn the oil, that particular oil ceases to exist.</code> |
232
+ | <code>What is the main focus of the research conducted for Orbit?</code> | <code>Orbit holds the culmination of almost a year of research with participants from a wide variety of backgrounds, needs, and jobs to be done.</code> | <code>So how do you win a hackathon mobility challenge? The SmartRoute team showed two of them.</code> |
233
+ | <code>What role do LLMs play in HRI's strategy?</code> | <code>We are excited about the potential of JournAI to transform mobility. By harnessing the power of LLMs and other AI technologies, HRI is driving towards a more connected, efficient, and sustainable future.</code> | <code>This simplified the process for users, who only had to pull and run the docker image to spawn a Jupyterlab app on their machine, open it in their browser, and create a new Pyspark notebook that automatically connected to our spark cluster. Our new workflow allows data science teams to configure their spark jobs and compute resources with options to request memory and CPU from the cluster and customize spark settings.</code> |
234
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
235
+ ```json
236
+ {
237
+ "scale": 20.0,
238
+ "similarity_fct": "cos_sim"
239
+ }
240
+ ```
241
+
242
+ ### Evaluation Dataset
243
+
244
+ #### Unnamed Dataset
245
+
246
+
247
+ * Size: 15,086 evaluation samples
248
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
249
+ * Approximate statistics based on the first 1000 samples:
250
+ | | anchor | positive | negative |
251
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
252
+ | type | string | string | string |
253
+ | details | <ul><li>min: 6 tokens</li><li>mean: 13.73 tokens</li><li>max: 24 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 39.51 tokens</li><li>max: 131 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 36.9 tokens</li><li>max: 153 tokens</li></ul> |
254
+ * Samples:
255
+ | anchor | positive | negative |
256
+ |:------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
257
+ | <code>What does the text suggest about the balance between creating tools and their practical application?</code> | <code>From technology to healthcare, these examples underline the importance of the interplay between theory and practice, between creating advanced tools and applying them effectively.</code> | <code>We found success when leaving the later panels empty as opposed to earlier ones. If we established a clear context and pain point for participants, they were often able to fill in a solution and resolution themselves.</code> |
258
+ | <code>Who are the personas mentioned in the text?</code> | <code>Our derived data sets are created based on personas that we have identified and their data access needs.</code> | <code>However there still exists a need to connect the map matched nodes that are outputted from the libraries to specific data points from the V2X data, in order to get the rest of the V2X features in a specific time frame.</code> |
259
+ | <code>Is this the first or second hackathon mentioned?</code> | <code>Up next is the first of two hackathons we participated in at Ohio State University.</code> | <code>The team did a great job by targeting a pervasive issue in such an intuitive way.</code> |
260
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
261
+ ```json
262
+ {
263
+ "scale": 20.0,
264
+ "similarity_fct": "cos_sim"
265
+ }
266
+ ```
267
+
268
+ ### Training Hyperparameters
269
+ #### Non-Default Hyperparameters
270
+
271
+ - `eval_strategy`: steps
272
+ - `per_device_train_batch_size`: 16
273
+ - `per_device_eval_batch_size`: 16
274
+ - `warmup_ratio`: 0.1
275
+ - `fp16`: True
276
+ - `batch_sampler`: no_duplicates
277
+
278
+ #### All Hyperparameters
279
+ <details><summary>Click to expand</summary>
280
+
281
+ - `overwrite_output_dir`: False
282
+ - `do_predict`: False
283
+ - `eval_strategy`: steps
284
+ - `prediction_loss_only`: True
285
+ - `per_device_train_batch_size`: 16
286
+ - `per_device_eval_batch_size`: 16
287
+ - `per_gpu_train_batch_size`: None
288
+ - `per_gpu_eval_batch_size`: None
289
+ - `gradient_accumulation_steps`: 1
290
+ - `eval_accumulation_steps`: None
291
+ - `torch_empty_cache_steps`: None
292
+ - `learning_rate`: 5e-05
293
+ - `weight_decay`: 0.0
294
+ - `adam_beta1`: 0.9
295
+ - `adam_beta2`: 0.999
296
+ - `adam_epsilon`: 1e-08
297
+ - `max_grad_norm`: 1.0
298
+ - `num_train_epochs`: 3
299
+ - `max_steps`: -1
300
+ - `lr_scheduler_type`: linear
301
+ - `lr_scheduler_kwargs`: {}
302
+ - `warmup_ratio`: 0.1
303
+ - `warmup_steps`: 0
304
+ - `log_level`: passive
305
+ - `log_level_replica`: warning
306
+ - `log_on_each_node`: True
307
+ - `logging_nan_inf_filter`: True
308
+ - `save_safetensors`: True
309
+ - `save_on_each_node`: False
310
+ - `save_only_model`: False
311
+ - `restore_callback_states_from_checkpoint`: False
312
+ - `no_cuda`: False
313
+ - `use_cpu`: False
314
+ - `use_mps_device`: False
315
+ - `seed`: 42
316
+ - `data_seed`: None
317
+ - `jit_mode_eval`: False
318
+ - `use_ipex`: False
319
+ - `bf16`: False
320
+ - `fp16`: True
321
+ - `fp16_opt_level`: O1
322
+ - `half_precision_backend`: auto
323
+ - `bf16_full_eval`: False
324
+ - `fp16_full_eval`: False
325
+ - `tf32`: None
326
+ - `local_rank`: 0
327
+ - `ddp_backend`: None
328
+ - `tpu_num_cores`: None
329
+ - `tpu_metrics_debug`: False
330
+ - `debug`: []
331
+ - `dataloader_drop_last`: False
332
+ - `dataloader_num_workers`: 0
333
+ - `dataloader_prefetch_factor`: None
334
+ - `past_index`: -1
335
+ - `disable_tqdm`: False
336
+ - `remove_unused_columns`: True
337
+ - `label_names`: None
338
+ - `load_best_model_at_end`: False
339
+ - `ignore_data_skip`: False
340
+ - `fsdp`: []
341
+ - `fsdp_min_num_params`: 0
342
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
343
+ - `fsdp_transformer_layer_cls_to_wrap`: None
344
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
345
+ - `deepspeed`: None
346
+ - `label_smoothing_factor`: 0.0
347
+ - `optim`: adamw_torch
348
+ - `optim_args`: None
349
+ - `adafactor`: False
350
+ - `group_by_length`: False
351
+ - `length_column_name`: length
352
+ - `ddp_find_unused_parameters`: None
353
+ - `ddp_bucket_cap_mb`: None
354
+ - `ddp_broadcast_buffers`: False
355
+ - `dataloader_pin_memory`: True
356
+ - `dataloader_persistent_workers`: False
357
+ - `skip_memory_metrics`: True
358
+ - `use_legacy_prediction_loop`: False
359
+ - `push_to_hub`: False
360
+ - `resume_from_checkpoint`: None
361
+ - `hub_model_id`: None
362
+ - `hub_strategy`: every_save
363
+ - `hub_private_repo`: False
364
+ - `hub_always_push`: False
365
+ - `gradient_checkpointing`: False
366
+ - `gradient_checkpointing_kwargs`: None
367
+ - `include_inputs_for_metrics`: False
368
+ - `eval_do_concat_batches`: True
369
+ - `fp16_backend`: auto
370
+ - `push_to_hub_model_id`: None
371
+ - `push_to_hub_organization`: None
372
+ - `mp_parameters`:
373
+ - `auto_find_batch_size`: False
374
+ - `full_determinism`: False
375
+ - `torchdynamo`: None
376
+ - `ray_scope`: last
377
+ - `ddp_timeout`: 1800
378
+ - `torch_compile`: False
379
+ - `torch_compile_backend`: None
380
+ - `torch_compile_mode`: None
381
+ - `dispatch_batches`: None
382
+ - `split_batches`: None
383
+ - `include_tokens_per_second`: False
384
+ - `include_num_input_tokens_seen`: False
385
+ - `neftune_noise_alpha`: None
386
+ - `optim_target_modules`: None
387
+ - `batch_eval_metrics`: False
388
+ - `eval_on_start`: False
389
+ - `eval_use_gather_object`: False
390
+ - `batch_sampler`: no_duplicates
391
+ - `multi_dataset_batch_sampler`: proportional
392
+
393
+ </details>
394
+
395
+ ### Training Logs
396
+ <details><summary>Click to expand</summary>
397
+
398
+ | Epoch | Step | Training Loss | loss | 99GPT-Finetuning-Embedding-test-01_max_accuracy |
399
+ |:------:|:-----:|:-------------:|:------:|:-----------------------------------------------:|
400
+ | 0.0265 | 100 | 0.7653 | 0.4309 | - |
401
+ | 0.0530 | 200 | 0.4795 | 0.2525 | - |
402
+ | 0.0795 | 300 | 0.3416 | 0.1996 | - |
403
+ | 0.1060 | 400 | 0.2713 | 0.1699 | - |
404
+ | 0.1326 | 500 | 0.2271 | 0.1558 | - |
405
+ | 0.1591 | 600 | 0.2427 | 0.1510 | - |
406
+ | 0.1856 | 700 | 0.2188 | 0.1414 | - |
407
+ | 0.2121 | 800 | 0.1936 | 0.1350 | - |
408
+ | 0.2386 | 900 | 0.2174 | 0.1370 | - |
409
+ | 0.2651 | 1000 | 0.2104 | 0.1265 | - |
410
+ | 0.2916 | 1100 | 0.2142 | 0.1324 | - |
411
+ | 0.3181 | 1200 | 0.2088 | 0.1297 | - |
412
+ | 0.3446 | 1300 | 0.1865 | 0.1240 | - |
413
+ | 0.3712 | 1400 | 0.177 | 0.1221 | - |
414
+ | 0.3977 | 1500 | 0.1735 | 0.1296 | - |
415
+ | 0.4242 | 1600 | 0.1746 | 0.1188 | - |
416
+ | 0.4507 | 1700 | 0.1639 | 0.1178 | - |
417
+ | 0.4772 | 1800 | 0.1958 | 0.1105 | - |
418
+ | 0.5037 | 1900 | 0.1874 | 0.1152 | - |
419
+ | 0.5302 | 2000 | 0.1676 | 0.1143 | - |
420
+ | 0.5567 | 2100 | 0.1671 | 0.1067 | - |
421
+ | 0.5832 | 2200 | 0.142 | 0.1154 | - |
422
+ | 0.6098 | 2300 | 0.1668 | 0.1150 | - |
423
+ | 0.6363 | 2400 | 0.1605 | 0.1091 | - |
424
+ | 0.6628 | 2500 | 0.1475 | 0.1096 | - |
425
+ | 0.6893 | 2600 | 0.1668 | 0.1066 | - |
426
+ | 0.7158 | 2700 | 0.166 | 0.1067 | - |
427
+ | 0.7423 | 2800 | 0.1611 | 0.0999 | - |
428
+ | 0.7688 | 2900 | 0.1747 | 0.1001 | - |
429
+ | 0.7953 | 3000 | 0.1436 | 0.1065 | - |
430
+ | 0.8218 | 3100 | 0.1579 | 0.0992 | - |
431
+ | 0.8484 | 3200 | 0.1718 | 0.1006 | - |
432
+ | 0.8749 | 3300 | 0.1567 | 0.0995 | - |
433
+ | 0.9014 | 3400 | 0.1634 | 0.0954 | - |
434
+ | 0.9279 | 3500 | 0.1441 | 0.0956 | - |
435
+ | 0.9544 | 3600 | 0.1433 | 0.0991 | - |
436
+ | 0.9809 | 3700 | 0.1562 | 0.0931 | - |
437
+ | 1.0074 | 3800 | 0.1421 | 0.0931 | - |
438
+ | 1.0339 | 3900 | 0.1424 | 0.0956 | - |
439
+ | 1.0604 | 4000 | 0.128 | 0.0900 | - |
440
+ | 1.0870 | 4100 | 0.1265 | 0.0921 | - |
441
+ | 1.1135 | 4200 | 0.1062 | 0.0944 | - |
442
+ | 1.1400 | 4300 | 0.1221 | 0.0900 | - |
443
+ | 1.1665 | 4400 | 0.1091 | 0.0944 | - |
444
+ | 1.1930 | 4500 | 0.091 | 0.0913 | - |
445
+ | 1.2195 | 4600 | 0.0823 | 0.0935 | - |
446
+ | 1.2460 | 4700 | 0.0946 | 0.0949 | - |
447
+ | 1.2725 | 4800 | 0.0803 | 0.0890 | - |
448
+ | 1.2990 | 4900 | 0.0796 | 0.0885 | - |
449
+ | 1.3256 | 5000 | 0.0699 | 0.0921 | - |
450
+ | 1.3521 | 5100 | 0.073 | 0.0909 | - |
451
+ | 1.3786 | 5200 | 0.0608 | 0.0934 | - |
452
+ | 1.4051 | 5300 | 0.07 | 0.0941 | - |
453
+ | 1.4316 | 5400 | 0.0732 | 0.0896 | - |
454
+ | 1.4581 | 5500 | 0.0639 | 0.0910 | - |
455
+ | 1.4846 | 5600 | 0.0722 | 0.0874 | - |
456
+ | 1.5111 | 5700 | 0.0635 | 0.0925 | - |
457
+ | 1.5376 | 5800 | 0.0631 | 0.0887 | - |
458
+ | 1.5642 | 5900 | 0.0589 | 0.0896 | - |
459
+ | 1.5907 | 6000 | 0.0636 | 0.0925 | - |
460
+ | 1.6172 | 6100 | 0.0702 | 0.0938 | - |
461
+ | 1.6437 | 6200 | 0.0572 | 0.0921 | - |
462
+ | 1.6702 | 6300 | 0.0516 | 0.0946 | - |
463
+ | 1.6967 | 6400 | 0.0695 | 0.0902 | - |
464
+ | 1.7232 | 6500 | 0.0632 | 0.0917 | - |
465
+ | 1.7497 | 6600 | 0.0697 | 0.0832 | - |
466
+ | 1.7762 | 6700 | 0.0747 | 0.0853 | - |
467
+ | 1.8028 | 6800 | 0.0615 | 0.0892 | - |
468
+ | 1.8293 | 6900 | 0.0747 | 0.0855 | - |
469
+ | 1.8558 | 7000 | 0.0668 | 0.0848 | - |
470
+ | 1.8823 | 7100 | 0.0747 | 0.0853 | - |
471
+ | 1.9088 | 7200 | 0.0774 | 0.0847 | - |
472
+ | 1.9353 | 7300 | 0.0546 | 0.0874 | - |
473
+ | 1.9618 | 7400 | 0.0708 | 0.0879 | - |
474
+ | 1.9883 | 7500 | 0.0632 | 0.0863 | - |
475
+ | 2.0148 | 7600 | 0.0601 | 0.0873 | - |
476
+ | 2.0414 | 7700 | 0.063 | 0.0870 | - |
477
+ | 2.0679 | 7800 | 0.0646 | 0.0819 | - |
478
+ | 2.0944 | 7900 | 0.0557 | 0.0825 | - |
479
+ | 2.1209 | 8000 | 0.0444 | 0.0841 | - |
480
+ | 2.1474 | 8100 | 0.049 | 0.0825 | - |
481
+ | 2.1739 | 8200 | 0.0441 | 0.0845 | - |
482
+ | 2.2004 | 8300 | 0.0451 | 0.0844 | - |
483
+ | 2.2269 | 8400 | 0.0346 | 0.0851 | - |
484
+ | 2.2534 | 8500 | 0.0398 | 0.0847 | - |
485
+ | 2.2800 | 8600 | 0.033 | 0.0855 | - |
486
+ | 2.3065 | 8700 | 0.0355 | 0.0851 | - |
487
+ | 2.3330 | 8800 | 0.0313 | 0.0867 | - |
488
+ | 2.3595 | 8900 | 0.0358 | 0.0870 | - |
489
+ | 2.3860 | 9000 | 0.0251 | 0.0867 | - |
490
+ | 2.4125 | 9100 | 0.0395 | 0.0854 | - |
491
+ | 2.4390 | 9200 | 0.0322 | 0.0838 | - |
492
+ | 2.4655 | 9300 | 0.0355 | 0.0847 | - |
493
+ | 2.4920 | 9400 | 0.034 | 0.0834 | - |
494
+ | 2.5186 | 9500 | 0.0345 | 0.0862 | - |
495
+ | 2.5451 | 9600 | 0.0272 | 0.0830 | - |
496
+ | 2.5716 | 9700 | 0.0275 | 0.0831 | - |
497
+ | 2.5981 | 9800 | 0.0345 | 0.0849 | - |
498
+ | 2.6246 | 9900 | 0.0289 | 0.0849 | - |
499
+ | 2.6511 | 10000 | 0.0282 | 0.0860 | - |
500
+ | 2.6776 | 10100 | 0.0279 | 0.0885 | - |
501
+ | 2.7041 | 10200 | 0.0344 | 0.0865 | - |
502
+ | 2.7306 | 10300 | 0.0326 | 0.0863 | - |
503
+ | 2.7572 | 10400 | 0.0383 | 0.0840 | - |
504
+ | 2.7837 | 10500 | 0.0338 | 0.0833 | - |
505
+ | 2.8102 | 10600 | 0.0298 | 0.0836 | - |
506
+ | 2.8367 | 10700 | 0.0402 | 0.0825 | - |
507
+ | 2.8632 | 10800 | 0.0361 | 0.0822 | - |
508
+ | 2.8897 | 10900 | 0.0388 | 0.0818 | - |
509
+ | 2.9162 | 11000 | 0.0347 | 0.0821 | - |
510
+ | 2.9427 | 11100 | 0.0341 | 0.0826 | - |
511
+ | 2.9692 | 11200 | 0.0373 | 0.0825 | - |
512
+ | 2.9958 | 11300 | 0.0354 | 0.0824 | - |
513
+ | 3.0 | 11316 | - | - | 0.9987 |
514
+
515
+ </details>
516
+
517
+ ### Framework Versions
518
+ - Python: 3.10.12
519
+ - Sentence Transformers: 3.1.1
520
+ - Transformers: 4.44.2
521
+ - PyTorch: 2.4.1+cu121
522
+ - Accelerate: 0.34.2
523
+ - Datasets: 3.0.1
524
+ - Tokenizers: 0.19.1
525
+
526
+ ## Citation
527
+
528
+ ### BibTeX
529
+
530
+ #### Sentence Transformers
531
+ ```bibtex
532
+ @inproceedings{reimers-2019-sentence-bert,
533
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
534
+ author = "Reimers, Nils and Gurevych, Iryna",
535
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
536
+ month = "11",
537
+ year = "2019",
538
+ publisher = "Association for Computational Linguistics",
539
+ url = "https://arxiv.org/abs/1908.10084",
540
+ }
541
+ ```
542
+
543
+ #### MultipleNegativesRankingLoss
544
+ ```bibtex
545
+ @misc{henderson2017efficient,
546
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
547
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
548
+ year={2017},
549
+ eprint={1705.00652},
550
+ archivePrefix={arXiv},
551
+ primaryClass={cs.CL}
552
+ }
553
+ ```
554
+
555
+ <!--
556
+ ## Glossary
557
+
558
+ *Clearly define terms in order to be accessible across audiences.*
559
+ -->
560
+
561
+ <!--
562
+ ## Model Card Authors
563
+
564
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
565
+ -->
566
+
567
+ <!--
568
+ ## Model Card Contact
569
+
570
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
571
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-small-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.44.2",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed9d49aac920e2fde08358340498be2341c504c2e22a3471fb71b49d68f50c78
3
+ size 133462128
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff