Training in progress, epoch 1, checkpoint
Browse files- last-checkpoint/README.md +174 -98
- last-checkpoint/optimizer.pt +1 -1
- last-checkpoint/pytorch_model.bin +1 -1
- last-checkpoint/rng_state.pth +1 -1
- last-checkpoint/scheduler.pt +1 -1
- last-checkpoint/trainer_state.json +212 -243
- last-checkpoint/training_args.bin +1 -1
last-checkpoint/README.md
CHANGED
@@ -7,11 +7,12 @@ tags:
|
|
7 |
- sentence-similarity
|
8 |
- feature-extraction
|
9 |
- generated_from_trainer
|
10 |
-
- dataset_size:
|
11 |
- loss:GISTEmbedLoss
|
12 |
- loss:CoSENTLoss
|
13 |
- loss:OnlineContrastiveLoss
|
14 |
- loss:MultipleNegativesSymmetricRankingLoss
|
|
|
15 |
base_model: microsoft/deberta-v3-small
|
16 |
datasets:
|
17 |
- sentence-transformers/all-nli
|
@@ -24,11 +25,21 @@ datasets:
|
|
24 |
- allenai/sciq
|
25 |
- allenai/qasc
|
26 |
- allenai/openbookqa
|
27 |
-
- sentence-transformers/msmarco-msmarco-distilbert-base-v3
|
28 |
- sentence-transformers/natural-questions
|
29 |
- sentence-transformers/trivia-qa
|
30 |
- sentence-transformers/quora-duplicates
|
31 |
- sentence-transformers/gooaq
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
widget:
|
33 |
- source_sentence: A man in a Santa Claus costume is sitting on a wooden chair holding
|
34 |
a microphone and a stringed instrument.
|
@@ -70,11 +81,51 @@ widget:
|
|
70 |
on account of his participation in same-sex union ceremonies.
|
71 |
- Tesla was the fourth of five children.
|
72 |
pipeline_tag: sentence-similarity
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
---
|
74 |
|
75 |
# SentenceTransformer based on microsoft/deberta-v3-small
|
76 |
|
77 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum), [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sciq_pairs](https://huggingface.co/datasets/allenai/sciq), [qasc_pairs](https://huggingface.co/datasets/allenai/qasc), [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa),
|
78 |
|
79 |
## Model Details
|
80 |
|
@@ -96,7 +147,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [m
|
|
96 |
- [sciq_pairs](https://huggingface.co/datasets/allenai/sciq)
|
97 |
- [qasc_pairs](https://huggingface.co/datasets/allenai/qasc)
|
98 |
- [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa)
|
99 |
-
-
|
100 |
- [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions)
|
101 |
- [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa)
|
102 |
- [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
|
@@ -175,6 +226,27 @@ You can finetune this model on your own dataset.
|
|
175 |
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
176 |
-->
|
177 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
<!--
|
179 |
## Bias, Risks and Limitations
|
180 |
|
@@ -194,7 +266,7 @@ You can finetune this model on your own dataset.
|
|
194 |
#### nli-pairs
|
195 |
|
196 |
* Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
|
197 |
-
* Size:
|
198 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
199 |
* Approximate statistics based on the first 1000 samples:
|
200 |
| | sentence1 | sentence2 |
|
@@ -210,7 +282,7 @@ You can finetune this model on your own dataset.
|
|
210 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
211 |
```json
|
212 |
{'guide': SentenceTransformer(
|
213 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
214 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
215 |
(2): Normalize()
|
216 |
), 'temperature': 0.05}
|
@@ -243,23 +315,23 @@ You can finetune this model on your own dataset.
|
|
243 |
#### vitaminc-pairs
|
244 |
|
245 |
* Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
|
246 |
-
* Size:
|
247 |
* Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
|
248 |
* Approximate statistics based on the first 1000 samples:
|
249 |
| | label | sentence1 | sentence2 |
|
250 |
|:--------|:-----------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
|
251 |
| type | int | string | string |
|
252 |
-
| details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 6 tokens</li><li>mean:
|
253 |
* Samples:
|
254 |
-
| label | sentence1
|
255 |
-
|
256 |
-
| <code>1</code> | <code>
|
257 |
-
| <code>1</code> | <code>
|
258 |
-
| <code>1</code> | <code>
|
259 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
260 |
```json
|
261 |
{'guide': SentenceTransformer(
|
262 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
263 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
264 |
(2): Normalize()
|
265 |
), 'temperature': 0.05}
|
@@ -268,41 +340,41 @@ You can finetune this model on your own dataset.
|
|
268 |
#### qnli-contrastive
|
269 |
|
270 |
* Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
|
271 |
-
* Size:
|
272 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
|
273 |
* Approximate statistics based on the first 1000 samples:
|
274 |
| | sentence1 | sentence2 | label |
|
275 |
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
|
276 |
| type | string | string | int |
|
277 |
-
| details | <ul><li>min:
|
278 |
* Samples:
|
279 |
-
| sentence1
|
280 |
-
|
281 |
-
| <code>
|
282 |
-
| <code>
|
283 |
-
| <code>What was
|
284 |
* Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
|
285 |
|
286 |
#### scitail-pairs-qa
|
287 |
|
288 |
* Dataset: [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
|
289 |
-
* Size:
|
290 |
* Columns: <code>sentence2</code> and <code>sentence1</code>
|
291 |
* Approximate statistics based on the first 1000 samples:
|
292 |
| | sentence2 | sentence1 |
|
293 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
294 |
| type | string | string |
|
295 |
-
| details | <ul><li>min: 7 tokens</li><li>mean: 15.
|
296 |
* Samples:
|
297 |
-
| sentence2
|
298 |
-
|
299 |
-
| <code>
|
300 |
-
| <code>
|
301 |
-
| <code>
|
302 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
303 |
```json
|
304 |
{'guide': SentenceTransformer(
|
305 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
306 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
307 |
(2): Normalize()
|
308 |
), 'temperature': 0.05}
|
@@ -311,23 +383,23 @@ You can finetune this model on your own dataset.
|
|
311 |
#### scitail-pairs-pos
|
312 |
|
313 |
* Dataset: [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
|
314 |
-
* Size:
|
315 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
316 |
* Approximate statistics based on the first 1000 samples:
|
317 |
| | sentence1 | sentence2 |
|
318 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
319 |
| type | string | string |
|
320 |
-
| details | <ul><li>min:
|
321 |
* Samples:
|
322 |
-
| sentence1
|
323 |
-
|
324 |
-
| <code>
|
325 |
-
| <code>
|
326 |
-
| <code>
|
327 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
328 |
```json
|
329 |
{'guide': SentenceTransformer(
|
330 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
331 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
332 |
(2): Normalize()
|
333 |
), 'temperature': 0.05}
|
@@ -336,19 +408,19 @@ You can finetune this model on your own dataset.
|
|
336 |
#### xsum-pairs
|
337 |
|
338 |
* Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
|
339 |
-
* Size:
|
340 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
341 |
* Approximate statistics based on the first 1000 samples:
|
342 |
-
| | sentence1 | sentence2
|
343 |
-
|
344 |
-
| type | string | string
|
345 |
-
| details | <ul><li>min:
|
346 |
* Samples:
|
347 |
-
| sentence1
|
348 |
-
|
349 |
-
| <code>
|
350 |
-
| <code>
|
351 |
-
| <code>
|
352 |
* Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
|
353 |
```json
|
354 |
{
|
@@ -360,7 +432,7 @@ You can finetune this model on your own dataset.
|
|
360 |
#### compression-pairs
|
361 |
|
362 |
* Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
|
363 |
-
* Size:
|
364 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
365 |
* Approximate statistics based on the first 1000 samples:
|
366 |
| | sentence1 | sentence2 |
|
@@ -384,7 +456,7 @@ You can finetune this model on your own dataset.
|
|
384 |
#### sciq_pairs
|
385 |
|
386 |
* Dataset: [sciq_pairs](https://huggingface.co/datasets/allenai/sciq) at [2c94ad3](https://huggingface.co/datasets/allenai/sciq/tree/2c94ad3e1aafab77146f384e23536f97a4849815)
|
387 |
-
* Size:
|
388 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
389 |
* Approximate statistics based on the first 1000 samples:
|
390 |
| | sentence1 | sentence2 |
|
@@ -400,7 +472,7 @@ You can finetune this model on your own dataset.
|
|
400 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
401 |
```json
|
402 |
{'guide': SentenceTransformer(
|
403 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
404 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
405 |
(2): Normalize()
|
406 |
), 'temperature': 0.05}
|
@@ -425,7 +497,7 @@ You can finetune this model on your own dataset.
|
|
425 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
426 |
```json
|
427 |
{'guide': SentenceTransformer(
|
428 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
429 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
430 |
(2): Normalize()
|
431 |
), 'temperature': 0.05}
|
@@ -450,7 +522,7 @@ You can finetune this model on your own dataset.
|
|
450 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
451 |
```json
|
452 |
{'guide': SentenceTransformer(
|
453 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
454 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
455 |
(2): Normalize()
|
456 |
), 'temperature': 0.05}
|
@@ -458,33 +530,26 @@ You can finetune this model on your own dataset.
|
|
458 |
|
459 |
#### msmarco_pairs
|
460 |
|
461 |
-
* Dataset:
|
462 |
-
* Size:
|
463 |
-
* Columns: <code>
|
464 |
* Approximate statistics based on the first 1000 samples:
|
465 |
-
| |
|
466 |
-
|
467 |
-
| type | string | string |
|
468 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 8.61 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 75.09 tokens</li><li>max: 206 tokens</li></ul> |
|
469 |
* Samples:
|
470 |
-
|
|
471 |
-
|
472 |
-
| <code>what are the liberal arts?</code> | <code>liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.</code> |
|
473 |
-
| <code>what is the mechanism of action of fibrinolytic or thrombolytic drugs?</code> | <code>Baillière's Clinical Haematology. 6 Mechanism of action of the thrombolytic agents. 6 Mechanism of action of the thrombolytic agents JEFFREY I. WEITZ Fibrin formed during the haemostatic, inflammatory or tissue repair process serves a temporary role, and must be degraded to restore normal tissue function and structure.</code> |
|
474 |
-
| <code>what is normal plat count</code> | <code>78 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).The average platelet count is 237,000 per mcL in men and 266,000 per mcL in women.8 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).</code> |
|
475 |
-
* Loss: [<code>
|
476 |
-
```json
|
477 |
-
{'guide': SentenceTransformer(
|
478 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
|
479 |
-
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
480 |
-
(2): Normalize()
|
481 |
-
), 'temperature': 0.05}
|
482 |
-
```
|
483 |
|
484 |
#### nq_pairs
|
485 |
|
486 |
* Dataset: [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
|
487 |
-
* Size:
|
488 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
489 |
* Approximate statistics based on the first 1000 samples:
|
490 |
| | sentence1 | sentence2 |
|
@@ -500,7 +565,7 @@ You can finetune this model on your own dataset.
|
|
500 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
501 |
```json
|
502 |
{'guide': SentenceTransformer(
|
503 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
504 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
505 |
(2): Normalize()
|
506 |
), 'temperature': 0.05}
|
@@ -509,7 +574,7 @@ You can finetune this model on your own dataset.
|
|
509 |
#### trivia_pairs
|
510 |
|
511 |
* Dataset: [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa) at [a7c36e3](https://huggingface.co/datasets/sentence-transformers/trivia-qa/tree/a7c36e3c8c8c01526bc094d79bf80d4c848b0ad0)
|
512 |
-
* Size:
|
513 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
514 |
* Approximate statistics based on the first 1000 samples:
|
515 |
| | sentence1 | sentence2 |
|
@@ -525,7 +590,7 @@ You can finetune this model on your own dataset.
|
|
525 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
526 |
```json
|
527 |
{'guide': SentenceTransformer(
|
528 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
529 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
530 |
(2): Normalize()
|
531 |
), 'temperature': 0.05}
|
@@ -534,7 +599,7 @@ You can finetune this model on your own dataset.
|
|
534 |
#### quora_pairs
|
535 |
|
536 |
* Dataset: [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
|
537 |
-
* Size:
|
538 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
539 |
* Approximate statistics based on the first 1000 samples:
|
540 |
| | sentence1 | sentence2 |
|
@@ -558,7 +623,7 @@ You can finetune this model on your own dataset.
|
|
558 |
#### gooaq_pairs
|
559 |
|
560 |
* Dataset: [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
|
561 |
-
* Size:
|
562 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
563 |
* Approximate statistics based on the first 1000 samples:
|
564 |
| | sentence1 | sentence2 |
|
@@ -574,7 +639,7 @@ You can finetune this model on your own dataset.
|
|
574 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
575 |
```json
|
576 |
{'guide': SentenceTransformer(
|
577 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
578 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
579 |
(2): Normalize()
|
580 |
), 'temperature': 0.05}
|
@@ -601,7 +666,7 @@ You can finetune this model on your own dataset.
|
|
601 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
602 |
```json
|
603 |
{'guide': SentenceTransformer(
|
604 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
605 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
606 |
(2): Normalize()
|
607 |
), 'temperature': 0.05}
|
@@ -626,7 +691,7 @@ You can finetune this model on your own dataset.
|
|
626 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
627 |
```json
|
628 |
{'guide': SentenceTransformer(
|
629 |
-
(0): Transformer({'max_seq_length': 512, 'do_lower_case':
|
630 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
631 |
(2): Normalize()
|
632 |
), 'temperature': 0.05}
|
@@ -656,11 +721,11 @@ You can finetune this model on your own dataset.
|
|
656 |
- `eval_strategy`: steps
|
657 |
- `per_device_train_batch_size`: 28
|
658 |
- `per_device_eval_batch_size`: 16
|
659 |
-
- `learning_rate`:
|
660 |
- `weight_decay`: 1e-10
|
661 |
- `num_train_epochs`: 2
|
662 |
- `lr_scheduler_type`: cosine
|
663 |
-
- `warmup_ratio`: 0.
|
664 |
- `save_safetensors`: False
|
665 |
- `fp16`: True
|
666 |
- `push_to_hub`: True
|
@@ -681,7 +746,7 @@ You can finetune this model on your own dataset.
|
|
681 |
- `per_gpu_eval_batch_size`: None
|
682 |
- `gradient_accumulation_steps`: 1
|
683 |
- `eval_accumulation_steps`: None
|
684 |
-
- `learning_rate`:
|
685 |
- `weight_decay`: 1e-10
|
686 |
- `adam_beta1`: 0.9
|
687 |
- `adam_beta2`: 0.999
|
@@ -691,7 +756,7 @@ You can finetune this model on your own dataset.
|
|
691 |
- `max_steps`: -1
|
692 |
- `lr_scheduler_type`: cosine
|
693 |
- `lr_scheduler_kwargs`: {}
|
694 |
-
- `warmup_ratio`: 0.
|
695 |
- `warmup_steps`: 0
|
696 |
- `log_level`: passive
|
697 |
- `log_level_replica`: warning
|
@@ -783,19 +848,18 @@ You can finetune this model on your own dataset.
|
|
783 |
</details>
|
784 |
|
785 |
### Training Logs
|
786 |
-
| Epoch
|
787 |
-
|
788 |
-
|
|
789 |
-
| 0.
|
790 |
-
| 0.
|
791 |
-
| 0.
|
792 |
-
| 0.
|
793 |
-
| 0.
|
794 |
-
| 0.
|
795 |
-
| 0.
|
796 |
-
| 0.
|
797 |
-
| 0.
|
798 |
-
| 1.0 | 4710 | 1.0356 | 0.5449 | 0.1294 | 0.6489 |
|
799 |
|
800 |
|
801 |
### Framework Versions
|
@@ -847,6 +911,18 @@ You can finetune this model on your own dataset.
|
|
847 |
}
|
848 |
```
|
849 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
850 |
<!--
|
851 |
## Glossary
|
852 |
|
|
|
7 |
- sentence-similarity
|
8 |
- feature-extraction
|
9 |
- generated_from_trainer
|
10 |
+
- dataset_size:526885
|
11 |
- loss:GISTEmbedLoss
|
12 |
- loss:CoSENTLoss
|
13 |
- loss:OnlineContrastiveLoss
|
14 |
- loss:MultipleNegativesSymmetricRankingLoss
|
15 |
+
- loss:MarginMSELoss
|
16 |
base_model: microsoft/deberta-v3-small
|
17 |
datasets:
|
18 |
- sentence-transformers/all-nli
|
|
|
25 |
- allenai/sciq
|
26 |
- allenai/qasc
|
27 |
- allenai/openbookqa
|
|
|
28 |
- sentence-transformers/natural-questions
|
29 |
- sentence-transformers/trivia-qa
|
30 |
- sentence-transformers/quora-duplicates
|
31 |
- sentence-transformers/gooaq
|
32 |
+
metrics:
|
33 |
+
- pearson_cosine
|
34 |
+
- spearman_cosine
|
35 |
+
- pearson_manhattan
|
36 |
+
- spearman_manhattan
|
37 |
+
- pearson_euclidean
|
38 |
+
- spearman_euclidean
|
39 |
+
- pearson_dot
|
40 |
+
- spearman_dot
|
41 |
+
- pearson_max
|
42 |
+
- spearman_max
|
43 |
widget:
|
44 |
- source_sentence: A man in a Santa Claus costume is sitting on a wooden chair holding
|
45 |
a microphone and a stringed instrument.
|
|
|
81 |
on account of his participation in same-sex union ceremonies.
|
82 |
- Tesla was the fourth of five children.
|
83 |
pipeline_tag: sentence-similarity
|
84 |
+
model-index:
|
85 |
+
- name: SentenceTransformer based on microsoft/deberta-v3-small
|
86 |
+
results:
|
87 |
+
- task:
|
88 |
+
type: semantic-similarity
|
89 |
+
name: Semantic Similarity
|
90 |
+
dataset:
|
91 |
+
name: sts test
|
92 |
+
type: sts-test
|
93 |
+
metrics:
|
94 |
+
- type: pearson_cosine
|
95 |
+
value: 0.2520910673470529
|
96 |
+
name: Pearson Cosine
|
97 |
+
- type: spearman_cosine
|
98 |
+
value: 0.2588662067006675
|
99 |
+
name: Spearman Cosine
|
100 |
+
- type: pearson_manhattan
|
101 |
+
value: 0.30439718484055006
|
102 |
+
name: Pearson Manhattan
|
103 |
+
- type: spearman_manhattan
|
104 |
+
value: 0.3013780326567434
|
105 |
+
name: Spearman Manhattan
|
106 |
+
- type: pearson_euclidean
|
107 |
+
value: 0.25977707672353506
|
108 |
+
name: Pearson Euclidean
|
109 |
+
- type: spearman_euclidean
|
110 |
+
value: 0.26078444276128726
|
111 |
+
name: Spearman Euclidean
|
112 |
+
- type: pearson_dot
|
113 |
+
value: 0.08121075567918108
|
114 |
+
name: Pearson Dot
|
115 |
+
- type: spearman_dot
|
116 |
+
value: 0.0753891417253212
|
117 |
+
name: Spearman Dot
|
118 |
+
- type: pearson_max
|
119 |
+
value: 0.30439718484055006
|
120 |
+
name: Pearson Max
|
121 |
+
- type: spearman_max
|
122 |
+
value: 0.3013780326567434
|
123 |
+
name: Spearman Max
|
124 |
---
|
125 |
|
126 |
# SentenceTransformer based on microsoft/deberta-v3-small
|
127 |
|
128 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum), [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sciq_pairs](https://huggingface.co/datasets/allenai/sciq), [qasc_pairs](https://huggingface.co/datasets/allenai/qasc), [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa), msmarco_pairs, [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions), [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) and [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
129 |
|
130 |
## Model Details
|
131 |
|
|
|
147 |
- [sciq_pairs](https://huggingface.co/datasets/allenai/sciq)
|
148 |
- [qasc_pairs](https://huggingface.co/datasets/allenai/qasc)
|
149 |
- [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa)
|
150 |
+
- msmarco_pairs
|
151 |
- [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions)
|
152 |
- [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa)
|
153 |
- [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
|
|
|
226 |
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
227 |
-->
|
228 |
|
229 |
+
## Evaluation
|
230 |
+
|
231 |
+
### Metrics
|
232 |
+
|
233 |
+
#### Semantic Similarity
|
234 |
+
* Dataset: `sts-test`
|
235 |
+
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
|
236 |
+
|
237 |
+
| Metric | Value |
|
238 |
+
|:--------------------|:-----------|
|
239 |
+
| pearson_cosine | 0.2521 |
|
240 |
+
| **spearman_cosine** | **0.2589** |
|
241 |
+
| pearson_manhattan | 0.3044 |
|
242 |
+
| spearman_manhattan | 0.3014 |
|
243 |
+
| pearson_euclidean | 0.2598 |
|
244 |
+
| spearman_euclidean | 0.2608 |
|
245 |
+
| pearson_dot | 0.0812 |
|
246 |
+
| spearman_dot | 0.0754 |
|
247 |
+
| pearson_max | 0.3044 |
|
248 |
+
| spearman_max | 0.3014 |
|
249 |
+
|
250 |
<!--
|
251 |
## Bias, Risks and Limitations
|
252 |
|
|
|
266 |
#### nli-pairs
|
267 |
|
268 |
* Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
|
269 |
+
* Size: 50,000 training samples
|
270 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
271 |
* Approximate statistics based on the first 1000 samples:
|
272 |
| | sentence1 | sentence2 |
|
|
|
282 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
283 |
```json
|
284 |
{'guide': SentenceTransformer(
|
285 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
286 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
287 |
(2): Normalize()
|
288 |
), 'temperature': 0.05}
|
|
|
315 |
#### vitaminc-pairs
|
316 |
|
317 |
* Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
|
318 |
+
* Size: 24,996 training samples
|
319 |
* Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
|
320 |
* Approximate statistics based on the first 1000 samples:
|
321 |
| | label | sentence1 | sentence2 |
|
322 |
|:--------|:-----------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
|
323 |
| type | int | string | string |
|
324 |
+
| details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 17.18 tokens</li><li>max: 56 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 37.57 tokens</li><li>max: 240 tokens</li></ul> |
|
325 |
* Samples:
|
326 |
+
| label | sentence1 | sentence2 |
|
327 |
+
|:---------------|:-----------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
328 |
+
| <code>1</code> | <code>Based on 93 reviews , the film has a 95 % approval rating</code> | <code>On review aggregation website Rotten Tomatoes , the film has an approval rating of 95 % , based on 93 reviews , with an average rating of 7.9/10 .</code> |
|
329 |
+
| <code>1</code> | <code>Bianca 's ex-husband is Gavin Ellis Ricky Butcher .</code> | <code>Whitney runs away and Bianca 's ex-husband Gavin Ellis Ricky Butcher ( Sid Owen ) finds her drunk .</code> |
|
330 |
+
| <code>1</code> | <code>Critics gave Jagga Jasoo ( film ) positive reviews .</code> | <code>The film received positive to reviews from the critics.</code> |
|
331 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
332 |
```json
|
333 |
{'guide': SentenceTransformer(
|
334 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
335 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
336 |
(2): Normalize()
|
337 |
), 'temperature': 0.05}
|
|
|
340 |
#### qnli-contrastive
|
341 |
|
342 |
* Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
|
343 |
+
* Size: 50,000 training samples
|
344 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
|
345 |
* Approximate statistics based on the first 1000 samples:
|
346 |
| | sentence1 | sentence2 | label |
|
347 |
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
|
348 |
| type | string | string | int |
|
349 |
+
| details | <ul><li>min: 7 tokens</li><li>mean: 13.99 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 35.78 tokens</li><li>max: 151 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
|
350 |
* Samples:
|
351 |
+
| sentence1 | sentence2 | label |
|
352 |
+
|:-------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
353 |
+
| <code>How big is Midtown's population?</code> | <code>The Eastern Market farmer's distribution center is the largest open-air flowerbed market in the United States and has more than 150 foods and specialty businesses.</code> | <code>0</code> |
|
354 |
+
| <code>How many immigrants lived in these tent cities?</code> | <code>During this period, food, clothes and furniture had to be rationed in what became known as the Austerity Period.</code> | <code>0</code> |
|
355 |
+
| <code>What Iranian film festival was created in 1973?</code> | <code>Attempts to organize a film festival that had begun in 1954 within the framework of the Golrizan Festival, bore fruits in the form of the Sepas Festival in 1969.</code> | <code>0</code> |
|
356 |
* Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
|
357 |
|
358 |
#### scitail-pairs-qa
|
359 |
|
360 |
* Dataset: [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
|
361 |
+
* Size: 14,987 training samples
|
362 |
* Columns: <code>sentence2</code> and <code>sentence1</code>
|
363 |
* Approximate statistics based on the first 1000 samples:
|
364 |
| | sentence2 | sentence1 |
|
365 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
366 |
| type | string | string |
|
367 |
+
| details | <ul><li>min: 7 tokens</li><li>mean: 15.97 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.01 tokens</li><li>max: 33 tokens</li></ul> |
|
368 |
* Samples:
|
369 |
+
| sentence2 | sentence1 |
|
370 |
+
|:---------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
|
371 |
+
| <code>The abundance of water makes the earth habitable for humans.</code> | <code>What makes the earth habitable for humans?</code> |
|
372 |
+
| <code>Individual is the term for an organism, or single living thing.</code> | <code>What is the term for an organism, or single living thing?</code> |
|
373 |
+
| <code>Ultrasound, a diagnostic technology, uses high-frequency vibrations transmitted into any tissue in contact with the transducer.</code> | <code>What diagnostic technology uses high-frequency vibrations transmitted into any tissue in contact with the transducer?</code> |
|
374 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
375 |
```json
|
376 |
{'guide': SentenceTransformer(
|
377 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
378 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
379 |
(2): Normalize()
|
380 |
), 'temperature': 0.05}
|
|
|
383 |
#### scitail-pairs-pos
|
384 |
|
385 |
* Dataset: [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
|
386 |
+
* Size: 8,600 training samples
|
387 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
388 |
* Approximate statistics based on the first 1000 samples:
|
389 |
| | sentence1 | sentence2 |
|
390 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
391 |
| type | string | string |
|
392 |
+
| details | <ul><li>min: 6 tokens</li><li>mean: 23.86 tokens</li><li>max: 59 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.69 tokens</li><li>max: 41 tokens</li></ul> |
|
393 |
* Samples:
|
394 |
+
| sentence1 | sentence2 |
|
395 |
+
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
|
396 |
+
| <code>Frost (also called white or hoarfrost) occurs when air temperatures dip below 32F and ice crystals form on the plant leaves, injuring and sometimes killing tender plants.</code> | <code>The ice crystals that form on the ground are called frost.</code> |
|
397 |
+
| <code>They are considered micronutrients because the body needs them in relatively small amounts compared with nutrients such as carbohydrates, proteins, fats and water.</code> | <code>Micronutrients is the term for nutrients the body needs in relatively small amounts, including vitamins and minerals.</code> |
|
398 |
+
| <code>However cell division goes through a sixth phase called cytokinesis, which is the division of the cytoplasm and the formation of two new daughter cells.</code> | <code>Cytokinesis divides the cytoplasm into two distinctive cells.</code> |
|
399 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
400 |
```json
|
401 |
{'guide': SentenceTransformer(
|
402 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
403 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
404 |
(2): Normalize()
|
405 |
), 'temperature': 0.05}
|
|
|
408 |
#### xsum-pairs
|
409 |
|
410 |
* Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
|
411 |
+
* Size: 50,000 training samples
|
412 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
413 |
* Approximate statistics based on the first 1000 samples:
|
414 |
+
| | sentence1 | sentence2 |
|
415 |
+
|:--------|:-------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
|
416 |
+
| type | string | string |
|
417 |
+
| details | <ul><li>min: 40 tokens</li><li>mean: 337.79 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 26.93 tokens</li><li>max: 75 tokens</li></ul> |
|
418 |
* Samples:
|
419 |
+
| sentence1 | sentence2 |
|
420 |
+
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|
|
421 |
+
| <code>A Haystack in the Evening Sun had not previously been authenticated because the work is largely unknown and the artist's signature is covered by paint.<br>However researchers at the University of Jyvaskyla in Finland uncovered the signature using a hyperspectral camera.<br>It also revealed the date of the work's creation - 1891.<br>The special camera used by researchers studied the painting's elemental composition by measuring X-ray fluorescence.<br>That allowed them to "see" below the surface, and analyse the materials used to create the work.<br>"The camera is principally operating as a scanner, which scans one line at a time," researcher Ilkka Polonen said.<br>"When the camera is moved using the scanner, an image of the whole picture can be obtained."<br>An analysis of the pigments and canvas fibres also confirmed the painting was by the Impressionist.<br>The artwork is currently owned by Finland's Serlachius Fine Arts Foundation, which acquired it in the 1950s through a London art broker.<br>The institution said the authentication means the artwork is the first Monet painting to be held in a Finnish public collection.</code> | <code>An oil painting thought to have been created by French Impressionist Claude Monet has been proven to be genuine through scientific testing.</code> |
|
422 |
+
| <code>Passengers on a British Airways flight from Prague and an Icelandair plane told of their relief after landing safely at Heathrow following the strikes on Wednesday.<br>One described "a white flash" while others said they felt a "crack" and "bang" as bolts hit the aircraft.<br>BA said planes were built to cope with lightning strikes and their jet would be inspected before resuming service.<br>Liz Dobson, a charity worker, told the Evening Standard: "It came out of the blue. There was a really loud bang and a white flash. Not really what you want on a plane.<br>"The lightning hit the wing."<br>Catherine Mayer, who is co-founder of the Women's Equality Party, was returning from Iceland.<br>She tweeted: "The plane got hit by lightning. Big flash and bang. #blimey."<br>She told the BBC how passengers sitting next to her looked distressed and frightened.<br>Icelandair confirmed that flight FI454 had been struck.<br>"The aircraft was of course inspected after landing for safety reasons, and as the lightning did not cause damage, the aircraft was returned to service later last night," said a spokesperson for the airline.<br>A spokesman for BA said: "Lightning strikes are fairly common and aircraft are designed to cope with them."<br>On average, commercial planes are struck by lightning about once a year according to Cardiff University's "lightning lab" in the UK, a recently established laboratory where Airbus conducts lightning tests.</code> | <code>Two planes have been struck by lightning over west London.</code> |
|
423 |
+
| <code>Arthur Mellar, 47, died after being seriously injured at Burghley House, on the Lincolnshire-Cambridgeshire border, on 12 July 2014.<br>Peterborough Crown Court heard the lift fell onto Mr Mellar as he tried to free a jammed item of luggage.<br>Burghley House Preservation Trust previously admitted it failed to ensure the welfare of an employee.<br>More on this and other local stories from across Lincolnshire<br>Mr Mellar got caught between the lift cage and the banister of the lift housing as he attempted to dislodge the baggage, the court heard.<br>The Health and Safety Executive, which brought the prosecution against the trust, said it was a "completely avoidable incident".<br>There were no safety measures in place to prevent it and the lift had not been inspected by an engineer since it was installed in the late 1950s, the court heard.<br>The court was also told the trust did not conduct a safety risk assessment on the lift, which was used to transport guests' luggage from different levels of the house.<br>Mr Mellar, from Barnsley, South Yorkshire, had worked at the 16th Century Burghley House for nine years.<br>Judge Sean Enright fined the trust £266,000, along with costs of nearly £17,000.<br>David Pennell, estates director at Burghley House, said: "Health and safety matters have always been paramount across all activities at Burghley and what happened to Arthur Mellar in July 2014 was a dreadful and tragic accident."<br>"Our thoughts are with Gerwin and Arthur's family at this time," he added.<br>The mansion has been used for locations in the films Pride and Prejudice and The Da Vinci Code.</code> | <code>The owners of Tudor stately home have been fined £266,000 after a butler was crushed to death by a faulty lift.</code> |
|
424 |
* Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
|
425 |
```json
|
426 |
{
|
|
|
432 |
#### compression-pairs
|
433 |
|
434 |
* Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
|
435 |
+
* Size: 50,000 training samples
|
436 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
437 |
* Approximate statistics based on the first 1000 samples:
|
438 |
| | sentence1 | sentence2 |
|
|
|
456 |
#### sciq_pairs
|
457 |
|
458 |
* Dataset: [sciq_pairs](https://huggingface.co/datasets/allenai/sciq) at [2c94ad3](https://huggingface.co/datasets/allenai/sciq/tree/2c94ad3e1aafab77146f384e23536f97a4849815)
|
459 |
+
* Size: 11,679 training samples
|
460 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
461 |
* Approximate statistics based on the first 1000 samples:
|
462 |
| | sentence1 | sentence2 |
|
|
|
472 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
473 |
```json
|
474 |
{'guide': SentenceTransformer(
|
475 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
476 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
477 |
(2): Normalize()
|
478 |
), 'temperature': 0.05}
|
|
|
497 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
498 |
```json
|
499 |
{'guide': SentenceTransformer(
|
500 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
501 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
502 |
(2): Normalize()
|
503 |
), 'temperature': 0.05}
|
|
|
522 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
523 |
```json
|
524 |
{'guide': SentenceTransformer(
|
525 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
526 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
527 |
(2): Normalize()
|
528 |
), 'temperature': 0.05}
|
|
|
530 |
|
531 |
#### msmarco_pairs
|
532 |
|
533 |
+
* Dataset: msmarco_pairs
|
534 |
+
* Size: 50,000 training samples
|
535 |
+
* Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
|
536 |
* Approximate statistics based on the first 1000 samples:
|
537 |
+
| | query | positive | negative | label |
|
538 |
+
|:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------|
|
539 |
+
| type | string | string | string | float |
|
540 |
+
| details | <ul><li>min: 4 tokens</li><li>mean: 8.61 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 75.09 tokens</li><li>max: 206 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 72.59 tokens</li><li>max: 216 tokens</li></ul> | <ul><li>min: -0.5</li><li>mean: 0.04</li><li>max: 0.6</li></ul> |
|
541 |
* Samples:
|
542 |
+
| query | positive | negative | label |
|
543 |
+
|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------|
|
544 |
+
| <code>what are the liberal arts?</code> | <code>liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.</code> | <code>The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number.</code> | <code>0.12154221534729004</code> |
|
545 |
+
| <code>what is the mechanism of action of fibrinolytic or thrombolytic drugs?</code> | <code>Baillière's Clinical Haematology. 6 Mechanism of action of the thrombolytic agents. 6 Mechanism of action of the thrombolytic agents JEFFREY I. WEITZ Fibrin formed during the haemostatic, inflammatory or tissue repair process serves a temporary role, and must be degraded to restore normal tissue function and structure.</code> | <code>Fibrinolytic drug. Fibrinolytic drug, also called thrombolytic drug, any agent that is capable of stimulating the dissolution of a blood clot (thrombus). Fibrinolytic drugs work by activating the so-called fibrinolytic pathway.</code> | <code>-0.05174225568771362</code> |
|
546 |
+
| <code>what is normal plat count</code> | <code>78 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).The average platelet count is 237,000 per mcL in men and 266,000 per mcL in women.8 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).</code> | <code>Your blood test results should be written in your maternity notes. Your platelet count will look something like Plat. 160x10.9/L, which means you have a platelet count of 160, which is in the normal range.If your platelet count is low, the blood test should be done again.This will keep track of whether or not your count is dropping.our platelet count will look something like Plat. 160x10.9/L, which means you have a platelet count of 160, which is in the normal range. If your platelet count is low, the blood test should be done again. This will keep track of whether or not your count is dropping.</code> | <code>-0.037523627281188965</code> |
|
547 |
+
* Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#marginmseloss)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
548 |
|
549 |
#### nq_pairs
|
550 |
|
551 |
* Dataset: [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
|
552 |
+
* Size: 50,000 training samples
|
553 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
554 |
* Approximate statistics based on the first 1000 samples:
|
555 |
| | sentence1 | sentence2 |
|
|
|
565 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
566 |
```json
|
567 |
{'guide': SentenceTransformer(
|
568 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
569 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
570 |
(2): Normalize()
|
571 |
), 'temperature': 0.05}
|
|
|
574 |
#### trivia_pairs
|
575 |
|
576 |
* Dataset: [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa) at [a7c36e3](https://huggingface.co/datasets/sentence-transformers/trivia-qa/tree/a7c36e3c8c8c01526bc094d79bf80d4c848b0ad0)
|
577 |
+
* Size: 50,000 training samples
|
578 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
579 |
* Approximate statistics based on the first 1000 samples:
|
580 |
| | sentence1 | sentence2 |
|
|
|
590 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
591 |
```json
|
592 |
{'guide': SentenceTransformer(
|
593 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
594 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
595 |
(2): Normalize()
|
596 |
), 'temperature': 0.05}
|
|
|
599 |
#### quora_pairs
|
600 |
|
601 |
* Dataset: [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
|
602 |
+
* Size: 50,000 training samples
|
603 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
604 |
* Approximate statistics based on the first 1000 samples:
|
605 |
| | sentence1 | sentence2 |
|
|
|
623 |
#### gooaq_pairs
|
624 |
|
625 |
* Dataset: [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
|
626 |
+
* Size: 50,000 training samples
|
627 |
* Columns: <code>sentence1</code> and <code>sentence2</code>
|
628 |
* Approximate statistics based on the first 1000 samples:
|
629 |
| | sentence1 | sentence2 |
|
|
|
639 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
640 |
```json
|
641 |
{'guide': SentenceTransformer(
|
642 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
643 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
644 |
(2): Normalize()
|
645 |
), 'temperature': 0.05}
|
|
|
666 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
667 |
```json
|
668 |
{'guide': SentenceTransformer(
|
669 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
670 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
671 |
(2): Normalize()
|
672 |
), 'temperature': 0.05}
|
|
|
691 |
* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
|
692 |
```json
|
693 |
{'guide': SentenceTransformer(
|
694 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
|
695 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
696 |
(2): Normalize()
|
697 |
), 'temperature': 0.05}
|
|
|
721 |
- `eval_strategy`: steps
|
722 |
- `per_device_train_batch_size`: 28
|
723 |
- `per_device_eval_batch_size`: 16
|
724 |
+
- `learning_rate`: 1e-05
|
725 |
- `weight_decay`: 1e-10
|
726 |
- `num_train_epochs`: 2
|
727 |
- `lr_scheduler_type`: cosine
|
728 |
+
- `warmup_ratio`: 0.5
|
729 |
- `save_safetensors`: False
|
730 |
- `fp16`: True
|
731 |
- `push_to_hub`: True
|
|
|
746 |
- `per_gpu_eval_batch_size`: None
|
747 |
- `gradient_accumulation_steps`: 1
|
748 |
- `eval_accumulation_steps`: None
|
749 |
+
- `learning_rate`: 1e-05
|
750 |
- `weight_decay`: 1e-10
|
751 |
- `adam_beta1`: 0.9
|
752 |
- `adam_beta2`: 0.999
|
|
|
756 |
- `max_steps`: -1
|
757 |
- `lr_scheduler_type`: cosine
|
758 |
- `lr_scheduler_kwargs`: {}
|
759 |
+
- `warmup_ratio`: 0.5
|
760 |
- `warmup_steps`: 0
|
761 |
- `log_level`: passive
|
762 |
- `log_level_replica`: warning
|
|
|
848 |
</details>
|
849 |
|
850 |
### Training Logs
|
851 |
+
| Epoch | Step | Training Loss | nli-pairs loss | scitail-pairs-pos loss | qnli-contrastive loss | sts-test_spearman_cosine |
|
852 |
+
|:------:|:-----:|:-------------:|:--------------:|:----------------------:|:---------------------:|:------------------------:|
|
853 |
+
| 0 | 0 | - | 4.2656 | 3.4484 | 4.1500 | 0.2589 |
|
854 |
+
| 0.1000 | 1883 | 3.6326 | 2.6953 | 2.1726 | 2.7029 | - |
|
855 |
+
| 0.2001 | 3766 | 1.7665 | 1.2885 | 0.9638 | 1.7135 | - |
|
856 |
+
| 0.3001 | 5649 | 1.1522 | 0.9094 | 0.7571 | 0.9165 | - |
|
857 |
+
| 0.4001 | 7532 | 0.9533 | 0.7290 | 0.6498 | 0.4304 | - |
|
858 |
+
| 0.5002 | 9415 | 0.8013 | 0.6432 | 0.6007 | 0.2591 | - |
|
859 |
+
| 0.6002 | 11298 | 0.6568 | 0.5626 | 0.5481 | 0.1365 | - |
|
860 |
+
| 0.7002 | 13181 | 0.6095 | 0.5226 | 0.5109 | 0.1643 | - |
|
861 |
+
| 0.8003 | 15064 | 0.5694 | 0.4921 | 0.5194 | 0.0517 | - |
|
862 |
+
| 0.9003 | 16947 | 0.5375 | 0.5061 | 0.5643 | 0.0462 | - |
|
|
|
863 |
|
864 |
|
865 |
### Framework Versions
|
|
|
911 |
}
|
912 |
```
|
913 |
|
914 |
+
#### MarginMSELoss
|
915 |
+
```bibtex
|
916 |
+
@misc{hofstätter2021improving,
|
917 |
+
title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
|
918 |
+
author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
|
919 |
+
year={2021},
|
920 |
+
eprint={2010.02666},
|
921 |
+
archivePrefix={arXiv},
|
922 |
+
primaryClass={cs.IR}
|
923 |
+
}
|
924 |
+
```
|
925 |
+
|
926 |
<!--
|
927 |
## Glossary
|
928 |
|
last-checkpoint/optimizer.pt
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1130520122
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9647fed89037ba3e3282c4e91d6cc40e3b6ede7cca94a3f8c8b22b2aec5e1b70
|
3 |
size 1130520122
|
last-checkpoint/pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 565251810
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ea28818c6e626e44d794c42590ed98ccd08652e0026f3086a02b5ead369e633d
|
3 |
size 565251810
|
last-checkpoint/rng_state.pth
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 14180
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:141ecdefc1c939079bd9377367b5723d56e31424215532c67fb39a68efcee019
|
3 |
size 14180
|
last-checkpoint/scheduler.pt
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1064
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:295caad4fbc2e25c07e26ab55cba43a9ec3977746a577c96911a58bfcbdf8ed4
|
3 |
size 1064
|
last-checkpoint/trainer_state.json
CHANGED
@@ -2,328 +2,297 @@
|
|
2 |
"best_metric": null,
|
3 |
"best_model_checkpoint": null,
|
4 |
"epoch": 1.0,
|
5 |
-
"eval_steps":
|
6 |
-
"global_step":
|
7 |
"is_hyper_param_search": false,
|
8 |
"is_local_process_zero": true,
|
9 |
"is_world_process_zero": true,
|
10 |
"log_history": [
|
11 |
{
|
12 |
-
"epoch": 0.
|
13 |
-
"grad_norm":
|
14 |
-
"learning_rate":
|
15 |
-
"loss":
|
16 |
-
"step":
|
17 |
},
|
18 |
{
|
19 |
-
"epoch": 0.
|
20 |
-
"eval_nli-pairs_loss":
|
21 |
-
"eval_nli-pairs_runtime":
|
22 |
-
"eval_nli-pairs_samples_per_second":
|
23 |
-
"eval_nli-pairs_steps_per_second":
|
24 |
-
"step":
|
25 |
},
|
26 |
{
|
27 |
-
"epoch": 0.
|
28 |
-
"eval_scitail-pairs-pos_loss": 2.
|
29 |
-
"eval_scitail-pairs-pos_runtime":
|
30 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
31 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
32 |
-
"step":
|
33 |
},
|
34 |
{
|
35 |
-
"epoch": 0.
|
36 |
-
"eval_qnli-contrastive_loss":
|
37 |
-
"eval_qnli-contrastive_runtime":
|
38 |
-
"eval_qnli-contrastive_samples_per_second":
|
39 |
-
"eval_qnli-contrastive_steps_per_second":
|
40 |
-
"step":
|
41 |
},
|
42 |
{
|
43 |
-
"epoch": 0.
|
44 |
-
"grad_norm":
|
45 |
-
"learning_rate":
|
46 |
-
"loss":
|
47 |
-
"step":
|
48 |
},
|
49 |
{
|
50 |
-
"epoch": 0.
|
51 |
-
"eval_nli-pairs_loss": 1.
|
52 |
-
"eval_nli-pairs_runtime":
|
53 |
-
"eval_nli-pairs_samples_per_second":
|
54 |
-
"eval_nli-pairs_steps_per_second":
|
55 |
-
"step":
|
56 |
},
|
57 |
{
|
58 |
-
"epoch": 0.
|
59 |
-
"eval_scitail-pairs-pos_loss": 0.
|
60 |
-
"eval_scitail-pairs-pos_runtime":
|
61 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
62 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
63 |
-
"step":
|
64 |
},
|
65 |
{
|
66 |
-
"epoch": 0.
|
67 |
-
"eval_qnli-contrastive_loss":
|
68 |
-
"eval_qnli-contrastive_runtime":
|
69 |
-
"eval_qnli-contrastive_samples_per_second":
|
70 |
-
"eval_qnli-contrastive_steps_per_second":
|
71 |
-
"step":
|
72 |
},
|
73 |
{
|
74 |
-
"epoch": 0.
|
75 |
-
"grad_norm":
|
76 |
-
"learning_rate":
|
77 |
-
"loss": 1.
|
78 |
-
"step":
|
79 |
},
|
80 |
{
|
81 |
-
"epoch": 0.
|
82 |
-
"eval_nli-pairs_loss":
|
83 |
-
"eval_nli-pairs_runtime":
|
84 |
-
"eval_nli-pairs_samples_per_second":
|
85 |
-
"eval_nli-pairs_steps_per_second":
|
86 |
-
"step":
|
87 |
},
|
88 |
{
|
89 |
-
"epoch": 0.
|
90 |
-
"eval_scitail-pairs-pos_loss": 0.
|
91 |
-
"eval_scitail-pairs-pos_runtime": 5.
|
92 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
93 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
94 |
-
"step":
|
95 |
},
|
96 |
{
|
97 |
-
"epoch": 0.
|
98 |
-
"eval_qnli-contrastive_loss":
|
99 |
-
"eval_qnli-contrastive_runtime":
|
100 |
-
"eval_qnli-contrastive_samples_per_second":
|
101 |
-
"eval_qnli-contrastive_steps_per_second": 21.
|
102 |
-
"step":
|
103 |
},
|
104 |
{
|
105 |
-
"epoch": 0.
|
106 |
-
"grad_norm":
|
107 |
-
"learning_rate":
|
108 |
-
"loss":
|
109 |
-
"step":
|
110 |
},
|
111 |
{
|
112 |
-
"epoch": 0.
|
113 |
-
"eval_nli-pairs_loss": 0.
|
114 |
-
"eval_nli-pairs_runtime":
|
115 |
-
"eval_nli-pairs_samples_per_second":
|
116 |
-
"eval_nli-pairs_steps_per_second":
|
117 |
-
"step":
|
118 |
},
|
119 |
{
|
120 |
-
"epoch": 0.
|
121 |
-
"eval_scitail-pairs-pos_loss": 0.
|
122 |
-
"eval_scitail-pairs-pos_runtime":
|
123 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
124 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
125 |
-
"step":
|
126 |
},
|
127 |
{
|
128 |
-
"epoch": 0.
|
129 |
-
"eval_qnli-contrastive_loss":
|
130 |
-
"eval_qnli-contrastive_runtime":
|
131 |
-
"eval_qnli-contrastive_samples_per_second":
|
132 |
-
"eval_qnli-contrastive_steps_per_second":
|
133 |
-
"step":
|
134 |
},
|
135 |
{
|
136 |
-
"epoch": 0.
|
137 |
-
"grad_norm":
|
138 |
-
"learning_rate":
|
139 |
-
"loss":
|
140 |
-
"step":
|
141 |
},
|
142 |
{
|
143 |
-
"epoch": 0.
|
144 |
-
"eval_nli-pairs_loss": 0.
|
145 |
-
"eval_nli-pairs_runtime":
|
146 |
-
"eval_nli-pairs_samples_per_second":
|
147 |
-
"eval_nli-pairs_steps_per_second":
|
148 |
-
"step":
|
149 |
},
|
150 |
{
|
151 |
-
"epoch": 0.
|
152 |
-
"eval_scitail-pairs-pos_loss": 0.
|
153 |
-
"eval_scitail-pairs-pos_runtime":
|
154 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
155 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
156 |
-
"step":
|
157 |
},
|
158 |
{
|
159 |
-
"epoch": 0.
|
160 |
-
"eval_qnli-contrastive_loss": 0.
|
161 |
-
"eval_qnli-contrastive_runtime":
|
162 |
-
"eval_qnli-contrastive_samples_per_second":
|
163 |
-
"eval_qnli-contrastive_steps_per_second":
|
164 |
-
"step":
|
165 |
},
|
166 |
{
|
167 |
-
"epoch": 0.
|
168 |
-
"grad_norm":
|
169 |
-
"learning_rate":
|
170 |
-
"loss":
|
171 |
-
"step":
|
172 |
},
|
173 |
{
|
174 |
-
"epoch": 0.
|
175 |
-
"eval_nli-pairs_loss": 0.
|
176 |
-
"eval_nli-pairs_runtime":
|
177 |
-
"eval_nli-pairs_samples_per_second":
|
178 |
-
"eval_nli-pairs_steps_per_second":
|
179 |
-
"step":
|
180 |
},
|
181 |
{
|
182 |
-
"epoch": 0.
|
183 |
-
"eval_scitail-pairs-pos_loss": 0.
|
184 |
-
"eval_scitail-pairs-pos_runtime":
|
185 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
186 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
187 |
-
"step":
|
188 |
},
|
189 |
{
|
190 |
-
"epoch": 0.
|
191 |
-
"eval_qnli-contrastive_loss": 0.
|
192 |
-
"eval_qnli-contrastive_runtime":
|
193 |
-
"eval_qnli-contrastive_samples_per_second":
|
194 |
-
"eval_qnli-contrastive_steps_per_second":
|
195 |
-
"step":
|
196 |
},
|
197 |
{
|
198 |
-
"epoch": 0.
|
199 |
-
"grad_norm":
|
200 |
-
"learning_rate":
|
201 |
-
"loss":
|
202 |
-
"step":
|
203 |
},
|
204 |
{
|
205 |
-
"epoch": 0.
|
206 |
-
"eval_nli-pairs_loss": 0.
|
207 |
-
"eval_nli-pairs_runtime":
|
208 |
-
"eval_nli-pairs_samples_per_second":
|
209 |
-
"eval_nli-pairs_steps_per_second":
|
210 |
-
"step":
|
211 |
},
|
212 |
{
|
213 |
-
"epoch": 0.
|
214 |
-
"eval_scitail-pairs-pos_loss": 0.
|
215 |
-
"eval_scitail-pairs-pos_runtime":
|
216 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
217 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
218 |
-
"step":
|
219 |
},
|
220 |
{
|
221 |
-
"epoch": 0.
|
222 |
-
"eval_qnli-contrastive_loss": 0.
|
223 |
-
"eval_qnli-contrastive_runtime":
|
224 |
-
"eval_qnli-contrastive_samples_per_second":
|
225 |
-
"eval_qnli-contrastive_steps_per_second":
|
226 |
-
"step":
|
227 |
},
|
228 |
{
|
229 |
-
"epoch": 0.
|
230 |
-
"grad_norm":
|
231 |
-
"learning_rate":
|
232 |
-
"loss":
|
233 |
-
"step":
|
234 |
},
|
235 |
{
|
236 |
-
"epoch": 0.
|
237 |
-
"eval_nli-pairs_loss": 0.
|
238 |
-
"eval_nli-pairs_runtime":
|
239 |
-
"eval_nli-pairs_samples_per_second":
|
240 |
-
"eval_nli-pairs_steps_per_second":
|
241 |
-
"step":
|
242 |
},
|
243 |
{
|
244 |
-
"epoch": 0.
|
245 |
-
"eval_scitail-pairs-pos_loss": 0.
|
246 |
-
"eval_scitail-pairs-pos_runtime":
|
247 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
248 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
249 |
-
"step":
|
250 |
},
|
251 |
{
|
252 |
-
"epoch": 0.
|
253 |
-
"eval_qnli-contrastive_loss": 0.
|
254 |
-
"eval_qnli-contrastive_runtime":
|
255 |
-
"eval_qnli-contrastive_samples_per_second":
|
256 |
-
"eval_qnli-contrastive_steps_per_second":
|
257 |
-
"step":
|
258 |
},
|
259 |
{
|
260 |
-
"epoch": 0.
|
261 |
-
"grad_norm":
|
262 |
-
"learning_rate":
|
263 |
-
"loss":
|
264 |
-
"step":
|
265 |
},
|
266 |
{
|
267 |
-
"epoch": 0.
|
268 |
-
"eval_nli-pairs_loss": 0.
|
269 |
-
"eval_nli-pairs_runtime":
|
270 |
-
"eval_nli-pairs_samples_per_second":
|
271 |
-
"eval_nli-pairs_steps_per_second":
|
272 |
-
"step":
|
273 |
},
|
274 |
{
|
275 |
-
"epoch": 0.
|
276 |
-
"eval_scitail-pairs-pos_loss": 0.
|
277 |
-
"eval_scitail-pairs-pos_runtime":
|
278 |
-
"eval_scitail-pairs-pos_samples_per_second":
|
279 |
-
"eval_scitail-pairs-pos_steps_per_second":
|
280 |
-
"step":
|
281 |
},
|
282 |
{
|
283 |
-
"epoch": 0.
|
284 |
-
"eval_qnli-contrastive_loss": 0.
|
285 |
-
"eval_qnli-contrastive_runtime":
|
286 |
-
"eval_qnli-contrastive_samples_per_second":
|
287 |
-
"eval_qnli-contrastive_steps_per_second":
|
288 |
-
"step":
|
289 |
-
},
|
290 |
-
{
|
291 |
-
"epoch": 1.0,
|
292 |
-
"grad_norm": 20.227073669433594,
|
293 |
-
"learning_rate": 1.701008869684049e-05,
|
294 |
-
"loss": 1.0356,
|
295 |
-
"step": 4710
|
296 |
-
},
|
297 |
-
{
|
298 |
-
"epoch": 1.0,
|
299 |
-
"eval_nli-pairs_loss": 0.6488831043243408,
|
300 |
-
"eval_nli-pairs_runtime": 23.1759,
|
301 |
-
"eval_nli-pairs_samples_per_second": 293.753,
|
302 |
-
"eval_nli-pairs_steps_per_second": 18.381,
|
303 |
-
"step": 4710
|
304 |
-
},
|
305 |
-
{
|
306 |
-
"epoch": 1.0,
|
307 |
-
"eval_scitail-pairs-pos_loss": 0.5449082255363464,
|
308 |
-
"eval_scitail-pairs-pos_runtime": 5.3602,
|
309 |
-
"eval_scitail-pairs-pos_samples_per_second": 243.276,
|
310 |
-
"eval_scitail-pairs-pos_steps_per_second": 15.298,
|
311 |
-
"step": 4710
|
312 |
-
},
|
313 |
-
{
|
314 |
-
"epoch": 1.0,
|
315 |
-
"eval_qnli-contrastive_loss": 0.1294127106666565,
|
316 |
-
"eval_qnli-contrastive_runtime": 15.5044,
|
317 |
-
"eval_qnli-contrastive_samples_per_second": 352.352,
|
318 |
-
"eval_qnli-contrastive_steps_per_second": 22.058,
|
319 |
-
"step": 4710
|
320 |
}
|
321 |
],
|
322 |
-
"logging_steps":
|
323 |
-
"max_steps":
|
324 |
"num_input_tokens_seen": 0,
|
325 |
"num_train_epochs": 2,
|
326 |
-
"save_steps":
|
327 |
"stateful_callbacks": {
|
328 |
"TrainerControl": {
|
329 |
"args": {
|
|
|
2 |
"best_metric": null,
|
3 |
"best_model_checkpoint": null,
|
4 |
"epoch": 1.0,
|
5 |
+
"eval_steps": 1883,
|
6 |
+
"global_step": 18824,
|
7 |
"is_hyper_param_search": false,
|
8 |
"is_local_process_zero": true,
|
9 |
"is_world_process_zero": true,
|
10 |
"log_history": [
|
11 |
{
|
12 |
+
"epoch": 0.10003187420314492,
|
13 |
+
"grad_norm": 39.029380798339844,
|
14 |
+
"learning_rate": 9.976625584360391e-07,
|
15 |
+
"loss": 3.6326,
|
16 |
+
"step": 1883
|
17 |
},
|
18 |
{
|
19 |
+
"epoch": 0.10003187420314492,
|
20 |
+
"eval_nli-pairs_loss": 2.6952593326568604,
|
21 |
+
"eval_nli-pairs_runtime": 25.731,
|
22 |
+
"eval_nli-pairs_samples_per_second": 264.584,
|
23 |
+
"eval_nli-pairs_steps_per_second": 16.556,
|
24 |
+
"step": 1883
|
25 |
},
|
26 |
{
|
27 |
+
"epoch": 0.10003187420314492,
|
28 |
+
"eval_scitail-pairs-pos_loss": 2.172569990158081,
|
29 |
+
"eval_scitail-pairs-pos_runtime": 6.2772,
|
30 |
+
"eval_scitail-pairs-pos_samples_per_second": 207.736,
|
31 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.063,
|
32 |
+
"step": 1883
|
33 |
},
|
34 |
{
|
35 |
+
"epoch": 0.10003187420314492,
|
36 |
+
"eval_qnli-contrastive_loss": 2.702913999557495,
|
37 |
+
"eval_qnli-contrastive_runtime": 16.475,
|
38 |
+
"eval_qnli-contrastive_samples_per_second": 331.593,
|
39 |
+
"eval_qnli-contrastive_steps_per_second": 20.759,
|
40 |
+
"step": 1883
|
41 |
},
|
42 |
{
|
43 |
+
"epoch": 0.20006374840628985,
|
44 |
+
"grad_norm": 25.459535598754883,
|
45 |
+
"learning_rate": 1.9974500637484067e-06,
|
46 |
+
"loss": 1.7665,
|
47 |
+
"step": 3766
|
48 |
},
|
49 |
{
|
50 |
+
"epoch": 0.20006374840628985,
|
51 |
+
"eval_nli-pairs_loss": 1.2885302305221558,
|
52 |
+
"eval_nli-pairs_runtime": 25.4564,
|
53 |
+
"eval_nli-pairs_samples_per_second": 267.438,
|
54 |
+
"eval_nli-pairs_steps_per_second": 16.734,
|
55 |
+
"step": 3766
|
56 |
},
|
57 |
{
|
58 |
+
"epoch": 0.20006374840628985,
|
59 |
+
"eval_scitail-pairs-pos_loss": 0.9637606143951416,
|
60 |
+
"eval_scitail-pairs-pos_runtime": 6.1565,
|
61 |
+
"eval_scitail-pairs-pos_samples_per_second": 211.809,
|
62 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.319,
|
63 |
+
"step": 3766
|
64 |
},
|
65 |
{
|
66 |
+
"epoch": 0.20006374840628985,
|
67 |
+
"eval_qnli-contrastive_loss": 1.713547945022583,
|
68 |
+
"eval_qnli-contrastive_runtime": 16.4307,
|
69 |
+
"eval_qnli-contrastive_samples_per_second": 332.487,
|
70 |
+
"eval_qnli-contrastive_steps_per_second": 20.815,
|
71 |
+
"step": 3766
|
72 |
},
|
73 |
{
|
74 |
+
"epoch": 0.3000956226094348,
|
75 |
+
"grad_norm": 0.8201059103012085,
|
76 |
+
"learning_rate": 2.9977688057798558e-06,
|
77 |
+
"loss": 1.1522,
|
78 |
+
"step": 5649
|
79 |
},
|
80 |
{
|
81 |
+
"epoch": 0.3000956226094348,
|
82 |
+
"eval_nli-pairs_loss": 0.9093547463417053,
|
83 |
+
"eval_nli-pairs_runtime": 25.1271,
|
84 |
+
"eval_nli-pairs_samples_per_second": 270.943,
|
85 |
+
"eval_nli-pairs_steps_per_second": 16.954,
|
86 |
+
"step": 5649
|
87 |
},
|
88 |
{
|
89 |
+
"epoch": 0.3000956226094348,
|
90 |
+
"eval_scitail-pairs-pos_loss": 0.7571232914924622,
|
91 |
+
"eval_scitail-pairs-pos_runtime": 5.9021,
|
92 |
+
"eval_scitail-pairs-pos_samples_per_second": 220.937,
|
93 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.893,
|
94 |
+
"step": 5649
|
95 |
},
|
96 |
{
|
97 |
+
"epoch": 0.3000956226094348,
|
98 |
+
"eval_qnli-contrastive_loss": 0.91651451587677,
|
99 |
+
"eval_qnli-contrastive_runtime": 16.2309,
|
100 |
+
"eval_qnli-contrastive_samples_per_second": 336.579,
|
101 |
+
"eval_qnli-contrastive_steps_per_second": 21.071,
|
102 |
+
"step": 5649
|
103 |
},
|
104 |
{
|
105 |
+
"epoch": 0.4001274968125797,
|
106 |
+
"grad_norm": 12.970890045166016,
|
107 |
+
"learning_rate": 3.9975563110922225e-06,
|
108 |
+
"loss": 0.9533,
|
109 |
+
"step": 7532
|
110 |
},
|
111 |
{
|
112 |
+
"epoch": 0.4001274968125797,
|
113 |
+
"eval_nli-pairs_loss": 0.7290090322494507,
|
114 |
+
"eval_nli-pairs_runtime": 25.3154,
|
115 |
+
"eval_nli-pairs_samples_per_second": 268.928,
|
116 |
+
"eval_nli-pairs_steps_per_second": 16.828,
|
117 |
+
"step": 7532
|
118 |
},
|
119 |
{
|
120 |
+
"epoch": 0.4001274968125797,
|
121 |
+
"eval_scitail-pairs-pos_loss": 0.6498324275016785,
|
122 |
+
"eval_scitail-pairs-pos_runtime": 6.0764,
|
123 |
+
"eval_scitail-pairs-pos_samples_per_second": 214.6,
|
124 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.495,
|
125 |
+
"step": 7532
|
126 |
},
|
127 |
{
|
128 |
+
"epoch": 0.4001274968125797,
|
129 |
+
"eval_qnli-contrastive_loss": 0.4303818643093109,
|
130 |
+
"eval_qnli-contrastive_runtime": 16.4463,
|
131 |
+
"eval_qnli-contrastive_samples_per_second": 332.172,
|
132 |
+
"eval_qnli-contrastive_steps_per_second": 20.795,
|
133 |
+
"step": 7532
|
134 |
},
|
135 |
{
|
136 |
+
"epoch": 0.5001593710157246,
|
137 |
+
"grad_norm": 10.865135192871094,
|
138 |
+
"learning_rate": 4.9973438164045905e-06,
|
139 |
+
"loss": 0.8013,
|
140 |
+
"step": 9415
|
141 |
},
|
142 |
{
|
143 |
+
"epoch": 0.5001593710157246,
|
144 |
+
"eval_nli-pairs_loss": 0.6431913375854492,
|
145 |
+
"eval_nli-pairs_runtime": 25.4337,
|
146 |
+
"eval_nli-pairs_samples_per_second": 267.676,
|
147 |
+
"eval_nli-pairs_steps_per_second": 16.749,
|
148 |
+
"step": 9415
|
149 |
},
|
150 |
{
|
151 |
+
"epoch": 0.5001593710157246,
|
152 |
+
"eval_scitail-pairs-pos_loss": 0.6006649732589722,
|
153 |
+
"eval_scitail-pairs-pos_runtime": 6.199,
|
154 |
+
"eval_scitail-pairs-pos_samples_per_second": 210.355,
|
155 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.228,
|
156 |
+
"step": 9415
|
157 |
},
|
158 |
{
|
159 |
+
"epoch": 0.5001593710157246,
|
160 |
+
"eval_qnli-contrastive_loss": 0.25907495617866516,
|
161 |
+
"eval_qnli-contrastive_runtime": 16.4896,
|
162 |
+
"eval_qnli-contrastive_samples_per_second": 331.299,
|
163 |
+
"eval_qnli-contrastive_steps_per_second": 20.74,
|
164 |
+
"step": 9415
|
165 |
},
|
166 |
{
|
167 |
+
"epoch": 0.6001912452188696,
|
168 |
+
"grad_norm": 2.3549954891204834,
|
169 |
+
"learning_rate": 5.997662558436039e-06,
|
170 |
+
"loss": 0.6568,
|
171 |
+
"step": 11298
|
172 |
},
|
173 |
{
|
174 |
+
"epoch": 0.6001912452188696,
|
175 |
+
"eval_nli-pairs_loss": 0.5626155734062195,
|
176 |
+
"eval_nli-pairs_runtime": 25.1226,
|
177 |
+
"eval_nli-pairs_samples_per_second": 270.991,
|
178 |
+
"eval_nli-pairs_steps_per_second": 16.957,
|
179 |
+
"step": 11298
|
180 |
},
|
181 |
{
|
182 |
+
"epoch": 0.6001912452188696,
|
183 |
+
"eval_scitail-pairs-pos_loss": 0.5481033325195312,
|
184 |
+
"eval_scitail-pairs-pos_runtime": 6.0513,
|
185 |
+
"eval_scitail-pairs-pos_samples_per_second": 215.492,
|
186 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.551,
|
187 |
+
"step": 11298
|
188 |
},
|
189 |
{
|
190 |
+
"epoch": 0.6001912452188696,
|
191 |
+
"eval_qnli-contrastive_loss": 0.13647136092185974,
|
192 |
+
"eval_qnli-contrastive_runtime": 16.3856,
|
193 |
+
"eval_qnli-contrastive_samples_per_second": 333.402,
|
194 |
+
"eval_qnli-contrastive_steps_per_second": 20.872,
|
195 |
+
"step": 11298
|
196 |
},
|
197 |
{
|
198 |
+
"epoch": 0.7002231194220144,
|
199 |
+
"grad_norm": 10.994942665100098,
|
200 |
+
"learning_rate": 6.997450063748406e-06,
|
201 |
+
"loss": 0.6095,
|
202 |
+
"step": 13181
|
203 |
},
|
204 |
{
|
205 |
+
"epoch": 0.7002231194220144,
|
206 |
+
"eval_nli-pairs_loss": 0.5226004719734192,
|
207 |
+
"eval_nli-pairs_runtime": 25.203,
|
208 |
+
"eval_nli-pairs_samples_per_second": 270.127,
|
209 |
+
"eval_nli-pairs_steps_per_second": 16.903,
|
210 |
+
"step": 13181
|
211 |
},
|
212 |
{
|
213 |
+
"epoch": 0.7002231194220144,
|
214 |
+
"eval_scitail-pairs-pos_loss": 0.5108869075775146,
|
215 |
+
"eval_scitail-pairs-pos_runtime": 6.1126,
|
216 |
+
"eval_scitail-pairs-pos_samples_per_second": 213.331,
|
217 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.415,
|
218 |
+
"step": 13181
|
219 |
},
|
220 |
{
|
221 |
+
"epoch": 0.7002231194220144,
|
222 |
+
"eval_qnli-contrastive_loss": 0.16431590914726257,
|
223 |
+
"eval_qnli-contrastive_runtime": 16.4372,
|
224 |
+
"eval_qnli-contrastive_samples_per_second": 332.355,
|
225 |
+
"eval_qnli-contrastive_steps_per_second": 20.806,
|
226 |
+
"step": 13181
|
227 |
},
|
228 |
{
|
229 |
+
"epoch": 0.8002549936251594,
|
230 |
+
"grad_norm": 8.826902389526367,
|
231 |
+
"learning_rate": 7.997768805779857e-06,
|
232 |
+
"loss": 0.5694,
|
233 |
+
"step": 15064
|
234 |
},
|
235 |
{
|
236 |
+
"epoch": 0.8002549936251594,
|
237 |
+
"eval_nli-pairs_loss": 0.49213743209838867,
|
238 |
+
"eval_nli-pairs_runtime": 25.0892,
|
239 |
+
"eval_nli-pairs_samples_per_second": 271.352,
|
240 |
+
"eval_nli-pairs_steps_per_second": 16.979,
|
241 |
+
"step": 15064
|
242 |
},
|
243 |
{
|
244 |
+
"epoch": 0.8002549936251594,
|
245 |
+
"eval_scitail-pairs-pos_loss": 0.5194270610809326,
|
246 |
+
"eval_scitail-pairs-pos_runtime": 6.261,
|
247 |
+
"eval_scitail-pairs-pos_samples_per_second": 208.273,
|
248 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.097,
|
249 |
+
"step": 15064
|
250 |
},
|
251 |
{
|
252 |
+
"epoch": 0.8002549936251594,
|
253 |
+
"eval_qnli-contrastive_loss": 0.05173656344413757,
|
254 |
+
"eval_qnli-contrastive_runtime": 16.3578,
|
255 |
+
"eval_qnli-contrastive_samples_per_second": 333.97,
|
256 |
+
"eval_qnli-contrastive_steps_per_second": 20.908,
|
257 |
+
"step": 15064
|
258 |
},
|
259 |
{
|
260 |
+
"epoch": 0.9002868678283042,
|
261 |
+
"grad_norm": 0.4369502067565918,
|
262 |
+
"learning_rate": 8.997556311092223e-06,
|
263 |
+
"loss": 0.5375,
|
264 |
+
"step": 16947
|
265 |
},
|
266 |
{
|
267 |
+
"epoch": 0.9002868678283042,
|
268 |
+
"eval_nli-pairs_loss": 0.5060996413230896,
|
269 |
+
"eval_nli-pairs_runtime": 25.3561,
|
270 |
+
"eval_nli-pairs_samples_per_second": 268.496,
|
271 |
+
"eval_nli-pairs_steps_per_second": 16.801,
|
272 |
+
"step": 16947
|
273 |
},
|
274 |
{
|
275 |
+
"epoch": 0.9002868678283042,
|
276 |
+
"eval_scitail-pairs-pos_loss": 0.5642966628074646,
|
277 |
+
"eval_scitail-pairs-pos_runtime": 6.1557,
|
278 |
+
"eval_scitail-pairs-pos_samples_per_second": 211.837,
|
279 |
+
"eval_scitail-pairs-pos_steps_per_second": 13.321,
|
280 |
+
"step": 16947
|
281 |
},
|
282 |
{
|
283 |
+
"epoch": 0.9002868678283042,
|
284 |
+
"eval_qnli-contrastive_loss": 0.046243228018283844,
|
285 |
+
"eval_qnli-contrastive_runtime": 16.4399,
|
286 |
+
"eval_qnli-contrastive_samples_per_second": 332.302,
|
287 |
+
"eval_qnli-contrastive_steps_per_second": 20.803,
|
288 |
+
"step": 16947
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
289 |
}
|
290 |
],
|
291 |
+
"logging_steps": 1883,
|
292 |
+
"max_steps": 37648,
|
293 |
"num_input_tokens_seen": 0,
|
294 |
"num_train_epochs": 2,
|
295 |
+
"save_steps": 18824,
|
296 |
"stateful_callbacks": {
|
297 |
"TrainerControl": {
|
298 |
"args": {
|
last-checkpoint/training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 5624
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:59541de2a5be81ee914802456d2cdf4f51877f8f7384f609c9fad68c9ba147bc
|
3 |
size 5624
|