Training in progress, epoch 1, checkpoint

Browse files

Files changed (7) hide show

last-checkpoint/README.md +174 -98
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/pytorch_model.bin +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +212 -243
last-checkpoint/training_args.bin +1 -1

last-checkpoint/README.md CHANGED Viewed

@@ -7,11 +7,12 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:131566
 - loss:GISTEmbedLoss
 - loss:CoSENTLoss
 - loss:OnlineContrastiveLoss
 - loss:MultipleNegativesSymmetricRankingLoss
 base_model: microsoft/deberta-v3-small
 datasets:
 - sentence-transformers/all-nli
@@ -24,11 +25,21 @@ datasets:
 - allenai/sciq
 - allenai/qasc
 - allenai/openbookqa
-- sentence-transformers/msmarco-msmarco-distilbert-base-v3
 - sentence-transformers/natural-questions
 - sentence-transformers/trivia-qa
 - sentence-transformers/quora-duplicates
 - sentence-transformers/gooaq
 widget:
 - source_sentence: A man in a Santa Claus costume is sitting on a wooden chair holding
     a microphone and a stringed instrument.
@@ -70,11 +81,51 @@ widget:
     on account of his participation in same-sex union ceremonies.
   - Tesla was the fourth of five children.
 pipeline_tag: sentence-similarity
 ---
 # SentenceTransformer based on microsoft/deberta-v3-small
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum), [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sciq_pairs](https://huggingface.co/datasets/allenai/sciq), [qasc_pairs](https://huggingface.co/datasets/allenai/qasc), [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa), [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3), [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions), [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) and [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
@@ -96,7 +147,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [m
     - [sciq_pairs](https://huggingface.co/datasets/allenai/sciq)
     - [qasc_pairs](https://huggingface.co/datasets/allenai/qasc)
     - [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa)
-    - [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3)
     - [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions)
     - [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa)
     - [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
@@ -175,6 +226,27 @@ You can finetune this model on your own dataset.
 *List how the model may foreseeably be misused and address what users ought not to do with the model.*
 -->
 <!--
 ## Bias, Risks and Limitations
@@ -194,7 +266,7 @@ You can finetune this model on your own dataset.
 #### nli-pairs
 * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                        |
@@ -210,7 +282,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -243,23 +315,23 @@ You can finetune this model on your own dataset.
 #### vitaminc-pairs
 * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
-* Size: 4,943 training samples
 * Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | label                        | sentence1                                                                         | sentence2                                                                          |
   |:--------|:-----------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
   | type    | int                          | string                                                                            | string                                                                             |
-  | details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.05 tokens</li><li>max: 93 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 37.61 tokens</li><li>max: 502 tokens</li></ul> |
 * Samples:
-  | label          | sentence1                                                                      | sentence2                                                                                                                                                                                                                      |
-  |:---------------|:-------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>1</code> | <code>Google used Motorola as a contract manufacturer .</code>                 | <code>As such , unlike the Nexus device marketed as being a Google product ; although the company used Motorola as a contract manufacturer , Google has stated that the Pixel is not based on any existing HTC device .</code> |
-  | <code>1</code> | <code>Based on 91 reviews , the film scored above 39 % .</code>                | <code>On Rotten Tomatoes , the film has a rating of 40 % , based on 91 reviews , with an average rating of 4.8/10 .</code>                                                                                                     |
-  | <code>1</code> | <code>Based on more than 26 reviews , the movie scored more than 24.5 %</code> | <code>On Rotten Tomatoes , the film has a rating of 25 % , based on 28 reviews , with an average rating of 3.9/10 .</code>                                                                                                     |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -268,41 +340,41 @@ You can finetune this model on your own dataset.
 #### qnli-contrastive
 * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                          | label                        |
   |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
   | type    | string                                                                            | string                                                                             | int                          |
-  | details | <ul><li>min: 6 tokens</li><li>mean: 13.82 tokens</li><li>max: 31 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 35.06 tokens</li><li>max: 201 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
 * Samples:
-  | sentence1                                                                     | sentence2                                                                                                                                                                                   | label          |
-  |:------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
-  | <code>What does LE stand for?</code>                                          | <code>Life expectancy at birth</code>                                                                                                                                                       | <code>0</code> |
-  | <code>For how long was the interest rate of Sumerian loans consistent?</code> | <code>They were denominated in barley or other crops and the interest rate was typically much higher than for commercial loans and could amount to 1/3 to 1/2 of the loan principal.</code> | <code>0</code> |
-  | <code>What was John's nickname?</code>                                        | <code>During John's early years, Henry attempted to resolve the question of his succession.</code>                                                                                          | <code>0</code> |
 * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
 #### scitail-pairs-qa
 * Dataset: [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
-* Size: 6,595 training samples
 * Columns: <code>sentence2</code> and <code>sentence1</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence2                                                                         | sentence1                                                                         |
   |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
   | type    | string                                                                            | string                                                                            |
-  | details | <ul><li>min: 7 tokens</li><li>mean: 15.84 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 14.79 tokens</li><li>max: 41 tokens</li></ul> |
 * Samples:
-  | sentence2                                                                                           | sentence1                                                              |
-  |:----------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------|
-  | <code>Light with the longest wavelengths is called infrared light.</code>                           | <code>Light with the longest wavelengths is called what?</code>        |
-  | <code>Four valence electrons can be found in a carbon atom.</code>                                  | <code>How many valence electrons can be found in a carbon atom?</code> |
-  | <code>The spines of a cactus help it survive because spines protect the cactus from animals.</code> | <code>How do the spines of a cactus help it survive?</code>            |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -311,23 +383,23 @@ You can finetune this model on your own dataset.
 #### scitail-pairs-pos
 * Dataset: [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
-* Size: 3,405 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                         |
   |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
   | type    | string                                                                            | string                                                                            |
-  | details | <ul><li>min: 8 tokens</li><li>mean: 23.95 tokens</li><li>max: 61 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.25 tokens</li><li>max: 36 tokens</li></ul> |
 * Samples:
-  | sentence1                                                                                                                                 | sentence2                                                                          |
-  |:------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
-  | <code>The cell cycle is composed of four stages.</code>                                                                                   | <code>Cells have four cycles.</code>                                               |
-  | <code>Plants exhale Oxygen gas in order for animals to breathe.</code>                                                                    | <code>Oxygen gas is given off by plants.</code>                                    |
-  | <code>S-phase (synthesis phase) is the part of the cell cycle in which DNA is replicated, occurring between G1 phase and G2 phase.</code> | <code>During the synthesis phase in the cell cycle, dna replication occurs.</code> |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -336,19 +408,19 @@ You can finetune this model on your own dataset.
 #### xsum-pairs
 * Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence1                                                                            | sentence2                                                                         |
-  |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                               | string                                                                            |
-  | details | <ul><li>min: 38 tokens</li><li>mean: 356.45 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 27.24 tokens</li><li>max: 73 tokens</li></ul> |
 * Samples:
-  | sentence1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | sentence2                                                                                                                                                                            |
-  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>London Mayor Sadiq Khan said the Central and Victoria lines will start on 19 August, with the Piccadilly, Jubilee and Northern lines following in the autumn.<br>The service will run through the night on Fridays and Saturdays.<br>It was due to start last September but was delayed due to disputes with unions.<br>Last year union members went on strike over the introduction of the all-night service, as well as their pay and conditions.<br>The service will run:<br>Maintenance workers belonging to the RMT union are still in dispute with Transport for London. The union agreed a deal for its drivers in March, but said the deal for maintenance staff was "inferior".<br>About 200 part-time drivers are currently taking part in a 14-week training programme for the new service.<br>RMT general secretary Mick Cash said there were "major" unresolved issues to do with conditions and pensions, and that it still had major concerns over safety.<br>"Against a background of massive cuts overshadowing TfL budgets all parties have to be clear that Night Tube, a development that RMT supports, cannot be delivered on the cheap," he said.<br>The mayor said: "The Night Tube is absolutely vital to my plans to support and grow London's night-time economy - creating more jobs and opportunities for all Londoners.<br>"The constant delays under the previous mayor let Londoners down badly. I have made getting the Night Tube up and running a priority."<br>London's transport commissioner, Mike Brown, said: "More than half a million people use the Tube after 10pm on Fridays and Saturdays, and the introduction of the Night Tube, which will support London's businesses and jobs, is a historic step in our modernisation of the Underground and our work to support London's economic growth."<br>Of all of the previous mayor Boris Johnson's schemes, the Night Tube was the most problematic to get off the ground.<br>I once called it a zombie policy - half dead wandering around, causing nothing but trouble.<br>It was meant to start in September 2015 but the transport unions were not happy about pay and changes to their work life balance.<br>The eventual solution - as well as bonuses - was to hire in part-time night drivers.<br>Now we have a start date (again) but crucially it is only for the Central and Victoria lines. That's because there are still unresolved issues with RMT staff who work mainly on the other planned night lines - the Jubilee, the Northern and the Piccadilly.<br>By going ahead with a start anyway there is a risk that antagonises the unions. It could be the first big test for the new mayor's relationship with London's transport unions.<br>Until it's running on five lines, there are probably more twists to come.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | <code>A date has been set for the launch of the Night Tube service - almost a year after it was first scheduled to begin.</code>                                                     |
-  | <code>Police knew that the suspect, Jose Jorge Balderas Garza, was in a relationship with a Colombian model.<br>Reports say when a Facebook profile in her name listed a Mexico City area as her location, officers moved in.<br>Mr Balderas denies he carried out the shooting.<br>He blames one of his associates for the attack last January on Cabanas, who played for Paraguay and Mexico's Club America.<br>The football star was shot in the head in the bathroom of a bar in Mexico City on 25 January last year. He survived, but a bullet remains lodged in his head.<br>Police also accuse Mr Balderas of running a drug-trafficking ring.<br>Officers say that during their inquiries about Mr Balderas, they became aware of his romantic link to the Colombian model and participant in the Miss Antioquia 2008 competition, Juliana Sossa.<br>A profile page in Ms Sossa's name on Facebook gave her current location as Lomas de Chapultepec, Mexico City.<br>On Tuesday, police moved into the area and found Ms Sossa, 25, and Mr Balderas in the house they shared. They arrested the couple, along with five other suspects.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <code>Media in Mexico say a post on the social networking site Facebook helped lead police to the main suspect in the shooting of the Paraguayan footballer Salvador Cabanas.</code> |
-  | <code>The Home Affairs Select Committee was told about "apparent corruption right at the heart of New Scotland Yard".<br>Officers investigating Nigerian fraudster James Ibori were accused of taking cash payments for information.<br>The former inspector has denied any wrong-doing and the two serving officers have declined to comment.<br>The allegations surfaced during a parliamentary hearing on the role of private detectives.<br>During evidence from lawyer Mike Schwarz from solicitors Bindmans, MPs were told of documents which allege that private investigation firm, Risc Management Ltd, was involved in "wining and dining and paying" officers working on the James Ibori case.<br>Ibori was the former state governor of the oil-rich Delta region in Nigeria, a corrupt official who stole hundreds of millions of pounds from his homeland.  He was sentenced to 13 years imprisonment last month after pleading guilty to laundering millions of pounds in the UK.<br>Mr Schwarz, representing Ibori's London lawyer who was jailed as part of the case, said: "The key culprits appear to be the key players who are the senior investigating officer, DI Gary Walters, and two of the key investigators who are DC John McDonald and DC (Peter) Clark."<br>How a thief almost became Nigeria's president<br>Ex-governor jailed for Â£50m fraud<br>Mr Schwarz told the committee there were records that "show about half a dozen payments totalling Â£20,000 over eight or nine months."<br>The allegations were originally made in an anonymous bundle of documents sent to former Metropolitan Police Commissioner Sir Paul Stephenson and the Independent Police Complaints Commission (IPCC) last summer.<br>In October last year, the IPCC instructed the Metropolitan Police's Directorate of Professional Standards (DPS) to conduct an internal investigation.<br>The paperwork included what purported to be detailed invoices and expense ledgers from Risc Management Ltd, headed at the time by two former Scotland Yard detectives, Keith Hunter and Cliff Knuckey.<br>Among the entries in the documents are details of what were said to be payments made to sources for confidential information about the on-going police investigation into Ibori.<br>One entry, dated shortly before police were due to interview James Ibori's London solicitor, reads: "Engaged with source in eliciting information re: forthcoming interviewing strategy to de (sic) deployed by police."<br>Immediately below, the entry states: "Cash payment made to above source for information provided. Â£5,000.00."<br>The DPS has said it has "an open mind" as to whether the documents are genuine or an elaborate forgery designed to pervert the course of justice.<br>Mr Schwarz criticised the internal inquiry into the corruption allegations as having "huge failings".<br>"Two of the key officers are still on duty on the same case and one has retired and joined Risc Management," he told the committee.<br>In a statement the Metropolitan Police said: "The MPS is investigating an allegation that illegal payments were made to police officers for information by a private investigation agency.<br>"The DPS referred the matter to the IPCC in October 2011 which agreed to supervise a DPS investigation into the allegations.<br>"This is an ongoing investigation and it would be inappropriate to comment further at this stage whilst the investigation is under way."<br>The BBC has confirmed that in the seven months since the DPS inquiry was launched, neither Risc Management, nor the law firm who hired them on behalf of James Ibori, have been contacted. No police officer has been asked about the allegations.<br>Already under fire for not properly investigating allegations of phone hacking, and with officers facing allegations they accepted cash from News International journalists, the new claims heap further pressure on the Metropolitan Police.<br>It is not possible to be certain whether the documents at the heart of the corruption allegations are genuine or elaborate fakes nor whether corrupt payments were actually made.<br>Risc Management denies it has ever paid money to any police officer.</code> | <code>Two Scotland Yard detective constables and a former detective inspector have been named as "key culprits" in bribery allegations revealed to MPs.</code>                       |
 * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
   ```json
   {
@@ -360,7 +432,7 @@ You can finetune this model on your own dataset.
 #### compression-pairs
 * Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                           | sentence2                                                                         |
@@ -384,7 +456,7 @@ You can finetune this model on your own dataset.
 #### sciq_pairs
 * Dataset: [sciq_pairs](https://huggingface.co/datasets/allenai/sciq) at [2c94ad3](https://huggingface.co/datasets/allenai/sciq/tree/2c94ad3e1aafab77146f384e23536f97a4849815)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                          |
@@ -400,7 +472,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -425,7 +497,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -450,7 +522,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -458,33 +530,26 @@ You can finetune this model on your own dataset.
 #### msmarco_pairs
-* Dataset: [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3) at [28ff31e](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3/tree/28ff31e4c97cddd53d298497f766e653f1e666f9)
-* Size: 10,000 training samples
-* Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence1                                                                        | sentence2                                                                           |
-  |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
-  | type    | string                                                                           | string                                                                              |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 8.61 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 75.09 tokens</li><li>max: 206 tokens</li></ul> |
 * Samples:
-  | sentence1                                                                           | sentence2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-  |:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>what are the liberal arts?</code>                                             | <code>liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                        |
-  | <code>what is the mechanism of action of fibrinolytic or thrombolytic drugs?</code> | <code>BailliÃ¨re's Clinical Haematology. 6 Mechanism of action of the thrombolytic agents. 6 Mechanism of action of the thrombolytic agents JEFFREY I. WEITZ Fibrin formed during the haemostatic, inflammatory or tissue repair process serves a temporary role, and must be degraded to restore normal tissue function and structure.</code>                                                                                                                                                                                                                                                                                                                                           |
-  | <code>what is normal plat count</code>                                              | <code>78 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).The average platelet count is 237,000 per mcL in men and 266,000 per mcL in women.8 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).</code> |
-* Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
-  ```json
-  {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
-    (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
-    (2): Normalize()
-  ), 'temperature': 0.05}
-  ```
 #### nq_pairs
 * Dataset: [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                          | sentence2                                                                            |
@@ -500,7 +565,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -509,7 +574,7 @@ You can finetune this model on your own dataset.
 #### trivia_pairs
 * Dataset: [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa) at [a7c36e3](https://huggingface.co/datasets/sentence-transformers/trivia-qa/tree/a7c36e3c8c8c01526bc094d79bf80d4c848b0ad0)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                            |
@@ -525,7 +590,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -534,7 +599,7 @@ You can finetune this model on your own dataset.
 #### quora_pairs
 * Dataset: [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                         |
@@ -558,7 +623,7 @@ You can finetune this model on your own dataset.
 #### gooaq_pairs
 * Dataset: [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
-* Size: 10,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                        | sentence2                                                                           |
@@ -574,7 +639,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -601,7 +666,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -626,7 +691,7 @@ You can finetune this model on your own dataset.
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
-    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
@@ -656,11 +721,11 @@ You can finetune this model on your own dataset.
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 28
 - `per_device_eval_batch_size`: 16
-- `learning_rate`: 2e-05
 - `weight_decay`: 1e-10
 - `num_train_epochs`: 2
 - `lr_scheduler_type`: cosine
-- `warmup_ratio`: 0.33
 - `save_safetensors`: False
 - `fp16`: True
 - `push_to_hub`: True
@@ -681,7 +746,7 @@ You can finetune this model on your own dataset.
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
-- `learning_rate`: 2e-05
 - `weight_decay`: 1e-10
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
@@ -691,7 +756,7 @@ You can finetune this model on your own dataset.
 - `max_steps`: -1
 - `lr_scheduler_type`: cosine
 - `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.33
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
@@ -783,19 +848,18 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch | Step | Training Loss | scitail-pairs-pos loss | qnli-contrastive loss | nli-pairs loss |
-|:-----:|:----:|:-------------:|:----------------------:|:---------------------:|:--------------:|
-| None  | 0    | -             | 3.4480                 | 4.1500                | 4.2865         |
-| 0.1   | 471  | 4.4848        | 2.4697                 | 3.3142                | 3.2277         |
-| 0.2   | 942  | 2.6358        | 0.9157                 | 2.6632                | 1.5920         |
-| 0.3   | 1413 | 1.7183        | 0.7445                 | 2.1308                | 1.1537         |
-| 0.4   | 1884 | 1.6114        | 0.6194                 | 1.6952                | 0.8992         |
-| 0.5   | 2355 | 1.5367        | 0.6661                 | 0.8698                | 0.8112         |
-| 0.6   | 2826 | 1.1657        | 0.5583                 | 0.8415                | 0.7330         |
-| 0.7   | 3297 | 1.2926        | 0.5284                 | 0.5240                | 0.6883         |
-| 0.8   | 3768 | 1.1523        | 0.4816                 | 0.4342                | 0.6776         |
-| 0.9   | 4239 | 1.0387        | 0.4603                 | 0.3022                | 0.6213         |
-| 1.0   | 4710 | 1.0356        | 0.5449                 | 0.1294                | 0.6489         |
 ### Framework Versions
@@ -847,6 +911,18 @@ You can finetune this model on your own dataset.
 }
 ```
 <!--
 ## Glossary

 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
+- dataset_size:526885
 - loss:GISTEmbedLoss
 - loss:CoSENTLoss
 - loss:OnlineContrastiveLoss
 - loss:MultipleNegativesSymmetricRankingLoss
+- loss:MarginMSELoss
 base_model: microsoft/deberta-v3-small
 datasets:
 - sentence-transformers/all-nli
 - allenai/sciq
 - allenai/qasc
 - allenai/openbookqa
 - sentence-transformers/natural-questions
 - sentence-transformers/trivia-qa
 - sentence-transformers/quora-duplicates
 - sentence-transformers/gooaq
+metrics:
+- pearson_cosine
+- spearman_cosine
+- pearson_manhattan
+- spearman_manhattan
+- pearson_euclidean
+- spearman_euclidean
+- pearson_dot
+- spearman_dot
+- pearson_max
+- spearman_max
 widget:
 - source_sentence: A man in a Santa Claus costume is sitting on a wooden chair holding
     a microphone and a stringed instrument.
     on account of his participation in same-sex union ceremonies.
   - Tesla was the fourth of five children.
 pipeline_tag: sentence-similarity
+model-index:
+- name: SentenceTransformer based on microsoft/deberta-v3-small
+  results:
+  - task:
+      type: semantic-similarity
+      name: Semantic Similarity
+    dataset:
+      name: sts test
+      type: sts-test
+    metrics:
+    - type: pearson_cosine
+      value: 0.2520910673470529
+      name: Pearson Cosine
+    - type: spearman_cosine
+      value: 0.2588662067006675
+      name: Spearman Cosine
+    - type: pearson_manhattan
+      value: 0.30439718484055006
+      name: Pearson Manhattan
+    - type: spearman_manhattan
+      value: 0.3013780326567434
+      name: Spearman Manhattan
+    - type: pearson_euclidean
+      value: 0.25977707672353506
+      name: Pearson Euclidean
+    - type: spearman_euclidean
+      value: 0.26078444276128726
+      name: Spearman Euclidean
+    - type: pearson_dot
+      value: 0.08121075567918108
+      name: Pearson Dot
+    - type: spearman_dot
+      value: 0.0753891417253212
+      name: Spearman Dot
+    - type: pearson_max
+      value: 0.30439718484055006
+      name: Pearson Max
+    - type: spearman_max
+      value: 0.3013780326567434
+      name: Spearman Max
 ---
 # SentenceTransformer based on microsoft/deberta-v3-small
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum), [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sciq_pairs](https://huggingface.co/datasets/allenai/sciq), [qasc_pairs](https://huggingface.co/datasets/allenai/qasc), [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa), msmarco_pairs, [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions), [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) and [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
     - [sciq_pairs](https://huggingface.co/datasets/allenai/sciq)
     - [qasc_pairs](https://huggingface.co/datasets/allenai/qasc)
     - [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa)
+    - msmarco_pairs
     - [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions)
     - [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa)
     - [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
 *List how the model may foreseeably be misused and address what users ought not to do with the model.*
 -->
+## Evaluation
+### Metrics
+#### Semantic Similarity
+* Dataset: `sts-test`
+* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
+| Metric              | Value      |
+|:--------------------|:-----------|
+| pearson_cosine      | 0.2521     |
+| **spearman_cosine** | **0.2589** |
+| pearson_manhattan   | 0.3044     |
+| spearman_manhattan  | 0.3014     |
+| pearson_euclidean   | 0.2598     |
+| spearman_euclidean  | 0.2608     |
+| pearson_dot         | 0.0812     |
+| spearman_dot        | 0.0754     |
+| pearson_max         | 0.3044     |
+| spearman_max        | 0.3014     |
 <!--
 ## Bias, Risks and Limitations
 #### nli-pairs
 * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                        |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 #### vitaminc-pairs
 * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
+* Size: 24,996 training samples
 * Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | label                        | sentence1                                                                         | sentence2                                                                          |
   |:--------|:-----------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
   | type    | int                          | string                                                                            | string                                                                             |
+  | details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 17.18 tokens</li><li>max: 56 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 37.57 tokens</li><li>max: 240 tokens</li></ul> |
 * Samples:
+  | label          | sentence1                                                              | sentence2                                                                                                                                                       |
+  |:---------------|:-----------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>1</code> | <code>Based on 93 reviews , the film has a 95 % approval rating</code> | <code>On review aggregation website Rotten Tomatoes , the film has an approval rating of 95 % , based on 93 reviews , with an average rating of 7.9/10 .</code> |
+  | <code>1</code> | <code>Bianca 's ex-husband is Gavin Ellis Ricky Butcher .</code>       | <code>Whitney runs away and Bianca 's ex-husband Gavin Ellis Ricky Butcher ( Sid Owen ) finds her drunk .</code>                                                |
+  | <code>1</code> | <code>Critics gave Jagga Jasoo ( film ) positive reviews .</code>      | <code>The film received positive to reviews from the critics.</code>                                                                                            |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 #### qnli-contrastive
 * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                          | label                        |
   |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
   | type    | string                                                                            | string                                                                             | int                          |
+  | details | <ul><li>min: 7 tokens</li><li>mean: 13.99 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 35.78 tokens</li><li>max: 151 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
 * Samples:
+  | sentence1                                                    | sentence2                                                                                                                                                                        | label          |
+  |:-------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
+  | <code>How big is Midtown's population?</code>                | <code>The Eastern Market farmer's distribution center is the largest open-air flowerbed market in the United States and has more than 150 foods and specialty businesses.</code> | <code>0</code> |
+  | <code>How many immigrants lived in these tent cities?</code> | <code>During this period, food, clothes and furniture had to be rationed in what became known as the Austerity Period.</code>                                                    | <code>0</code> |
+  | <code>What Iranian film festival was created in 1973?</code> | <code>Attempts to organize a film festival that had begun in 1954 within the framework of the Golrizan Festival, bore fruits in the form of the Sepas Festival in 1969.</code>   | <code>0</code> |
 * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
 #### scitail-pairs-qa
 * Dataset: [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
+* Size: 14,987 training samples
 * Columns: <code>sentence2</code> and <code>sentence1</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence2                                                                         | sentence1                                                                         |
   |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
   | type    | string                                                                            | string                                                                            |
+  | details | <ul><li>min: 7 tokens</li><li>mean: 15.97 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.01 tokens</li><li>max: 33 tokens</li></ul> |
 * Samples:
+  | sentence2                                                                                                                                    | sentence1                                                                                                                          |
+  |:---------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
+  | <code>The abundance of water makes the earth habitable for humans.</code>                                                                    | <code>What makes the earth habitable for humans?</code>                                                                            |
+  | <code>Individual is the term for an organism, or single living thing.</code>                                                                 | <code>What is the term for an organism, or single living thing?</code>                                                             |
+  | <code>Ultrasound, a diagnostic technology, uses high-frequency vibrations transmitted into any tissue in contact with the transducer.</code> | <code>What diagnostic technology uses high-frequency vibrations transmitted into any tissue in contact with the transducer?</code> |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 #### scitail-pairs-pos
 * Dataset: [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
+* Size: 8,600 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                         |
   |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
   | type    | string                                                                            | string                                                                            |
+  | details | <ul><li>min: 6 tokens</li><li>mean: 23.86 tokens</li><li>max: 59 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.69 tokens</li><li>max: 41 tokens</li></ul> |
 * Samples:
+  | sentence1                                                                                                                                                                               | sentence2                                                                                                                          |
+  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
+  | <code>Frost (also called white or hoarfrost) occurs when air temperatures dip below 32F and ice crystals form on the plant leaves, injuring and sometimes killing tender plants.</code> | <code>The ice crystals that form on the ground are called frost.</code>                                                            |
+  | <code>They are considered micronutrients because the body needs them in relatively small amounts compared with nutrients such as carbohydrates, proteins, fats and water.</code>        | <code>Micronutrients is the term for nutrients the body needs in relatively small amounts, including vitamins and minerals.</code> |
+  | <code>However cell division goes through a sixth phase called cytokinesis, which is the division of the cytoplasm and the formation of two new daughter cells.</code>                   | <code>Cytokinesis divides the cytoplasm into two distinctive cells.</code>                                                         |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 #### xsum-pairs
 * Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | sentence1                                                                            | sentence2                                                                          |
+  |:--------|:-------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
+  | type    | string                                                                               | string                                                                             |
+  | details | <ul><li>min: 40 tokens</li><li>mean: 337.79 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 26.93 tokens</li><li>max: 75 tokens</li></ul> |
 * Samples:
+  | sentence1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | sentence2                                                                                                                                                |
+  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>A Haystack in the Evening Sun had not previously been authenticated because the work is largely unknown and the artist's signature is covered by paint.<br>However researchers at the University of Jyvaskyla in Finland uncovered the signature using a hyperspectral camera.<br>It also revealed the date of the work's creation - 1891.<br>The special camera used by researchers studied the painting's elemental composition by measuring X-ray fluorescence.<br>That allowed them to "see" below the surface, and analyse the materials used to create the work.<br>"The camera is principally operating as a scanner, which scans one line at a time," researcher Ilkka Polonen said.<br>"When the camera is moved using the scanner, an image of the whole picture can be obtained."<br>An analysis of the pigments and canvas fibres also confirmed the painting was by the Impressionist.<br>The artwork is currently owned by Finland's Serlachius Fine Arts Foundation, which acquired it in the 1950s through a London art broker.<br>The institution said the authentication means the artwork is the first Monet painting to be held in a Finnish public collection.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | <code>An oil painting thought to have been created by French Impressionist Claude Monet has been proven to be genuine through scientific testing.</code> |
+  | <code>Passengers on a British Airways flight from Prague and an Icelandair plane told of their relief after landing safely at Heathrow following the strikes on Wednesday.<br>One described "a white flash" while others said they felt a "crack" and "bang" as bolts hit the aircraft.<br>BA said planes were built to cope with lightning strikes and their jet would be inspected before resuming service.<br>Liz Dobson, a charity worker, told the Evening Standard: "It came out of the blue. There was a really loud bang and a white flash. Not really what you want on a plane.<br>"The lightning hit the wing."<br>Catherine Mayer, who is co-founder of the Women's Equality Party, was returning from Iceland.<br>She tweeted: "The plane got hit by lightning. Big flash and bang. #blimey."<br>She told the BBC how passengers sitting next to her looked distressed and frightened.<br>Icelandair confirmed that flight FI454 had been struck.<br>"The aircraft was of course inspected after landing for safety reasons, and as the lightning did not cause damage, the aircraft was returned to service later last night," said a spokesperson for the airline.<br>A spokesman for BA said: "Lightning strikes are fairly common and aircraft are designed to cope with them."<br>On average, commercial planes are struck by lightning about once a year according to Cardiff University's "lightning lab" in the UK, a recently established laboratory where Airbus conducts lightning tests.</code>                                                                                                                                                                              | <code>Two planes have been struck by lightning over west London.</code>                                                                                  |
+  | <code>Arthur Mellar, 47, died after being seriously injured at Burghley House, on the Lincolnshire-Cambridgeshire border, on 12 July 2014.<br>Peterborough Crown Court heard the lift fell onto Mr Mellar as he tried to free a jammed item of luggage.<br>Burghley House Preservation Trust previously admitted it failed to ensure the welfare of an employee.<br>More on this and other local stories from across Lincolnshire<br>Mr Mellar got caught between the lift cage and the banister of the lift housing as he attempted to dislodge the baggage, the court heard.<br>The Health and Safety Executive, which brought the prosecution against the trust, said it was a "completely avoidable incident".<br>There were no safety measures in place to prevent it and the lift had not been inspected by an engineer since it was installed in the late 1950s, the court heard.<br>The court was also told the trust did not conduct a safety risk assessment on the lift, which was used to transport guests' luggage from different levels of the house.<br>Mr Mellar, from Barnsley, South Yorkshire, had worked at the 16th Century Burghley House for nine years.<br>Judge Sean Enright fined the trust £266,000, along with costs of nearly £17,000.<br>David Pennell, estates director at Burghley House, said: "Health and safety matters have always been paramount across all activities at Burghley and what happened to Arthur Mellar in July 2014 was a dreadful and tragic accident."<br>"Our thoughts are with Gerwin and Arthur's family at this time," he added.<br>The mansion has been used for locations in the films Pride and Prejudice and The Da Vinci Code.</code> | <code>The owners of Tudor stately home have been fined £266,000 after a butler was crushed to death by a faulty lift.</code>                             |
 * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
   ```json
   {
 #### compression-pairs
 * Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                           | sentence2                                                                         |
 #### sciq_pairs
 * Dataset: [sciq_pairs](https://huggingface.co/datasets/allenai/sciq) at [2c94ad3](https://huggingface.co/datasets/allenai/sciq/tree/2c94ad3e1aafab77146f384e23536f97a4849815)
+* Size: 11,679 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                          |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 #### msmarco_pairs
+* Dataset: msmarco_pairs
+* Size: 50,000 training samples
+* Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | query                                                                            | positive                                                                            | negative                                                                            | label                                                           |
+  |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                              | string                                                                              | float                                                           |
+  | details | <ul><li>min: 4 tokens</li><li>mean: 8.61 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 75.09 tokens</li><li>max: 206 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 72.59 tokens</li><li>max: 216 tokens</li></ul> | <ul><li>min: -0.5</li><li>mean: 0.04</li><li>max: 0.6</li></ul> |
 * Samples:
+  | query                                                                               | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | negative                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | label                              |
+  |:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------|
+  | <code>what are the liberal arts?</code>                                             | <code>liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                        | <code>The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number.</code>                                                                                                                                                                                                                                                                                                         | <code>0.12154221534729004</code>   |
+  | <code>what is the mechanism of action of fibrinolytic or thrombolytic drugs?</code> | <code>BailliÃ¨re's Clinical Haematology. 6 Mechanism of action of the thrombolytic agents. 6 Mechanism of action of the thrombolytic agents JEFFREY I. WEITZ Fibrin formed during the haemostatic, inflammatory or tissue repair process serves a temporary role, and must be degraded to restore normal tissue function and structure.</code>                                                                                                                                                                                                                                                                                                                                           | <code>Fibrinolytic drug. Fibrinolytic drug, also called thrombolytic drug, any agent that is capable of stimulating the dissolution of a blood clot (thrombus). Fibrinolytic drugs work by activating the so-called fibrinolytic pathway.</code>                                                                                                                                                                                                                                                                                                                                                                                        | <code>-0.05174225568771362</code>  |
+  | <code>what is normal plat count</code>                                              | <code>78 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).The average platelet count is 237,000 per mcL in men and 266,000 per mcL in women.8 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).</code> | <code>Your blood test results should be written in your maternity notes. Your platelet count will look something like Plat. 160x10.9/L, which means you have a platelet count of 160, which is in the normal range.If your platelet count is low, the blood test should be done again.This will keep track of whether or not your count is dropping.our platelet count will look something like Plat. 160x10.9/L, which means you have a platelet count of 160, which is in the normal range. If your platelet count is low, the blood test should be done again. This will keep track of whether or not your count is dropping.</code> | <code>-0.037523627281188965</code> |
+* Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#marginmseloss)
 #### nq_pairs
 * Dataset: [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                          | sentence2                                                                            |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 #### trivia_pairs
 * Dataset: [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa) at [a7c36e3](https://huggingface.co/datasets/sentence-transformers/trivia-qa/tree/a7c36e3c8c8c01526bc094d79bf80d4c848b0ad0)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                            |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 #### quora_pairs
 * Dataset: [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                         | sentence2                                                                         |
 #### gooaq_pairs
 * Dataset: [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
+* Size: 50,000 training samples
 * Columns: <code>sentence1</code> and <code>sentence2</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence1                                                                        | sentence2                                                                           |
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
   ```json
   {'guide': SentenceTransformer(
+    (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
     (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
     (2): Normalize()
   ), 'temperature': 0.05}
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 28
 - `per_device_eval_batch_size`: 16
+- `learning_rate`: 1e-05
 - `weight_decay`: 1e-10
 - `num_train_epochs`: 2
 - `lr_scheduler_type`: cosine
+- `warmup_ratio`: 0.5
 - `save_safetensors`: False
 - `fp16`: True
 - `push_to_hub`: True
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
+- `learning_rate`: 1e-05
 - `weight_decay`: 1e-10
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `max_steps`: -1
 - `lr_scheduler_type`: cosine
 - `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.5
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
 </details>
 ### Training Logs
+| Epoch  | Step  | Training Loss | nli-pairs loss | scitail-pairs-pos loss | qnli-contrastive loss | sts-test_spearman_cosine |
+|:------:|:-----:|:-------------:|:--------------:|:----------------------:|:---------------------:|:------------------------:|
+| 0      | 0     | -             | 4.2656         | 3.4484                 | 4.1500                | 0.2589                   |
+| 0.1000 | 1883  | 3.6326        | 2.6953         | 2.1726                 | 2.7029                | -                        |
+| 0.2001 | 3766  | 1.7665        | 1.2885         | 0.9638                 | 1.7135                | -                        |
+| 0.3001 | 5649  | 1.1522        | 0.9094         | 0.7571                 | 0.9165                | -                        |
+| 0.4001 | 7532  | 0.9533        | 0.7290         | 0.6498                 | 0.4304                | -                        |
+| 0.5002 | 9415  | 0.8013        | 0.6432         | 0.6007                 | 0.2591                | -                        |
+| 0.6002 | 11298 | 0.6568        | 0.5626         | 0.5481                 | 0.1365                | -                        |
+| 0.7002 | 13181 | 0.6095        | 0.5226         | 0.5109                 | 0.1643                | -                        |
+| 0.8003 | 15064 | 0.5694        | 0.4921         | 0.5194                 | 0.0517                | -                        |
+| 0.9003 | 16947 | 0.5375        | 0.5061         | 0.5643                 | 0.0462                | -                        |
 ### Framework Versions
 }
 ```
+#### MarginMSELoss
+```bibtex
+@misc{hofstätter2021improving,
+    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
+    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
+    year={2021},
+    eprint={2010.02666},
+    archivePrefix={arXiv},
+    primaryClass={cs.IR}
+}
+```
 <!--
 ## Glossary

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2ffdc419c0955c5b768dcbbe88faad4bf22cea53605c9fe4074ca9d083adb05c
 size 1130520122

 version https://git-lfs.github.com/spec/v1
+oid sha256:9647fed89037ba3e3282c4e91d6cc40e3b6ede7cca94a3f8c8b22b2aec5e1b70
 size 1130520122

last-checkpoint/pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dbe2fc34c5e1d1cc3880d11fddb3b85a8a35ab6f560ccd6126456a313344d43b
 size 565251810

 version https://git-lfs.github.com/spec/v1
+oid sha256:ea28818c6e626e44d794c42590ed98ccd08652e0026f3086a02b5ead369e633d
 size 565251810

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f687a4ba9f684d4913c83ab55045603e2534ca8151cfefbdd3db16a5515199a5
 size 14180

 version https://git-lfs.github.com/spec/v1
+oid sha256:141ecdefc1c939079bd9377367b5723d56e31424215532c67fb39a68efcee019
 size 14180

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:072f295cd9400d44a23f01cc82ad8c9b8b89be4ef3aba1d3b8e750e9883aec90
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:295caad4fbc2e25c07e26ab55cba43a9ec3977746a577c96911a58bfcbdf8ed4
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,328 +2,297 @@
   "best_metric": null,
   "best_model_checkpoint": null,
   "epoch": 1.0,
-  "eval_steps": 471,
-  "global_step": 4710,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.1,
-      "grad_norm": 17.846229553222656,
-      "learning_rate": 3.004181408813123e-06,
-      "loss": 4.4848,
-      "step": 471
     },
     {
-      "epoch": 0.1,
-      "eval_nli-pairs_loss": 3.227689504623413,
-      "eval_nli-pairs_runtime": 23.5758,
-      "eval_nli-pairs_samples_per_second": 288.77,
-      "eval_nli-pairs_steps_per_second": 18.069,
-      "step": 471
     },
     {
-      "epoch": 0.1,
-      "eval_scitail-pairs-pos_loss": 2.469686508178711,
-      "eval_scitail-pairs-pos_runtime": 5.4679,
-      "eval_scitail-pairs-pos_samples_per_second": 238.485,
-      "eval_scitail-pairs-pos_steps_per_second": 14.997,
-      "step": 471
     },
     {
-      "epoch": 0.1,
-      "eval_qnli-contrastive_loss": 3.3142430782318115,
-      "eval_qnli-contrastive_runtime": 15.7426,
-      "eval_qnli-contrastive_samples_per_second": 347.019,
-      "eval_qnli-contrastive_steps_per_second": 21.724,
-      "step": 471
     },
     {
-      "epoch": 0.2,
-      "grad_norm": 29.59261703491211,
-      "learning_rate": 6.027661627532969e-06,
-      "loss": 2.6358,
-      "step": 942
     },
     {
-      "epoch": 0.2,
-      "eval_nli-pairs_loss": 1.5920209884643555,
-      "eval_nli-pairs_runtime": 23.3765,
-      "eval_nli-pairs_samples_per_second": 291.232,
-      "eval_nli-pairs_steps_per_second": 18.223,
-      "step": 942
     },
     {
-      "epoch": 0.2,
-      "eval_scitail-pairs-pos_loss": 0.9157330989837646,
-      "eval_scitail-pairs-pos_runtime": 5.4478,
-      "eval_scitail-pairs-pos_samples_per_second": 239.363,
-      "eval_scitail-pairs-pos_steps_per_second": 15.052,
-      "step": 942
     },
     {
-      "epoch": 0.2,
-      "eval_qnli-contrastive_loss": 2.663238763809204,
-      "eval_qnli-contrastive_runtime": 15.751,
-      "eval_qnli-contrastive_samples_per_second": 346.836,
-      "eval_qnli-contrastive_steps_per_second": 21.713,
-      "step": 942
     },
     {
-      "epoch": 0.3,
-      "grad_norm": 24.539047241210938,
-      "learning_rate": 9.057574782888389e-06,
-      "loss": 1.7183,
-      "step": 1413
     },
     {
-      "epoch": 0.3,
-      "eval_nli-pairs_loss": 1.1536647081375122,
-      "eval_nli-pairs_runtime": 23.6115,
-      "eval_nli-pairs_samples_per_second": 288.335,
-      "eval_nli-pairs_steps_per_second": 18.042,
-      "step": 1413
     },
     {
-      "epoch": 0.3,
-      "eval_scitail-pairs-pos_loss": 0.7445429563522339,
-      "eval_scitail-pairs-pos_runtime": 5.3966,
-      "eval_scitail-pairs-pos_samples_per_second": 241.635,
-      "eval_scitail-pairs-pos_steps_per_second": 15.195,
-      "step": 1413
     },
     {
-      "epoch": 0.3,
-      "eval_qnli-contrastive_loss": 2.130812406539917,
-      "eval_qnli-contrastive_runtime": 15.7293,
-      "eval_qnli-contrastive_samples_per_second": 347.313,
-      "eval_qnli-contrastive_steps_per_second": 21.743,
-      "step": 1413
     },
     {
-      "epoch": 0.4,
-      "grad_norm": 139.8046875,
-      "learning_rate": 1.208748793824381e-05,
-      "loss": 1.6114,
-      "step": 1884
     },
     {
-      "epoch": 0.4,
-      "eval_nli-pairs_loss": 0.8992123007774353,
-      "eval_nli-pairs_runtime": 23.6196,
-      "eval_nli-pairs_samples_per_second": 288.236,
-      "eval_nli-pairs_steps_per_second": 18.036,
-      "step": 1884
     },
     {
-      "epoch": 0.4,
-      "eval_scitail-pairs-pos_loss": 0.6193641424179077,
-      "eval_scitail-pairs-pos_runtime": 5.4024,
-      "eval_scitail-pairs-pos_samples_per_second": 241.376,
-      "eval_scitail-pairs-pos_steps_per_second": 15.179,
-      "step": 1884
     },
     {
-      "epoch": 0.4,
-      "eval_qnli-contrastive_loss": 1.6952241659164429,
-      "eval_qnli-contrastive_runtime": 15.7392,
-      "eval_qnli-contrastive_samples_per_second": 347.095,
-      "eval_qnli-contrastive_steps_per_second": 21.729,
-      "step": 1884
     },
     {
-      "epoch": 0.5,
-      "grad_norm": 2.1193487644195557,
-      "learning_rate": 1.511740109359923e-05,
-      "loss": 1.5367,
-      "step": 2355
     },
     {
-      "epoch": 0.5,
-      "eval_nli-pairs_loss": 0.8112400770187378,
-      "eval_nli-pairs_runtime": 23.4573,
-      "eval_nli-pairs_samples_per_second": 290.23,
-      "eval_nli-pairs_steps_per_second": 18.161,
-      "step": 2355
     },
     {
-      "epoch": 0.5,
-      "eval_scitail-pairs-pos_loss": 0.6661093831062317,
-      "eval_scitail-pairs-pos_runtime": 5.3621,
-      "eval_scitail-pairs-pos_samples_per_second": 243.189,
-      "eval_scitail-pairs-pos_steps_per_second": 15.293,
-      "step": 2355
     },
     {
-      "epoch": 0.5,
-      "eval_qnli-contrastive_loss": 0.8697724938392639,
-      "eval_qnli-contrastive_runtime": 15.7092,
-      "eval_qnli-contrastive_samples_per_second": 347.759,
-      "eval_qnli-contrastive_steps_per_second": 21.771,
-      "step": 2355
     },
     {
-      "epoch": 0.6,
-      "grad_norm": 8.693464279174805,
-      "learning_rate": 1.814731424895465e-05,
-      "loss": 1.1657,
-      "step": 2826
     },
     {
-      "epoch": 0.6,
-      "eval_nli-pairs_loss": 0.7330080270767212,
-      "eval_nli-pairs_runtime": 23.359,
-      "eval_nli-pairs_samples_per_second": 291.451,
-      "eval_nli-pairs_steps_per_second": 18.237,
-      "step": 2826
     },
     {
-      "epoch": 0.6,
-      "eval_scitail-pairs-pos_loss": 0.558278501033783,
-      "eval_scitail-pairs-pos_runtime": 5.3162,
-      "eval_scitail-pairs-pos_samples_per_second": 245.289,
-      "eval_scitail-pairs-pos_steps_per_second": 15.425,
-      "step": 2826
     },
     {
-      "epoch": 0.6,
-      "eval_qnli-contrastive_loss": 0.8414629101753235,
-      "eval_qnli-contrastive_runtime": 15.5773,
-      "eval_qnli-contrastive_samples_per_second": 350.703,
-      "eval_qnli-contrastive_steps_per_second": 21.955,
-      "step": 2826
     },
     {
-      "epoch": 0.7,
-      "grad_norm": 20.00510025024414,
-      "learning_rate": 1.995853561663268e-05,
-      "loss": 1.2926,
-      "step": 3297
     },
     {
-      "epoch": 0.7,
-      "eval_nli-pairs_loss": 0.688292384147644,
-      "eval_nli-pairs_runtime": 23.1585,
-      "eval_nli-pairs_samples_per_second": 293.974,
-      "eval_nli-pairs_steps_per_second": 18.395,
-      "step": 3297
     },
     {
-      "epoch": 0.7,
-      "eval_scitail-pairs-pos_loss": 0.5283708572387695,
-      "eval_scitail-pairs-pos_runtime": 5.3322,
-      "eval_scitail-pairs-pos_samples_per_second": 244.552,
-      "eval_scitail-pairs-pos_steps_per_second": 15.378,
-      "step": 3297
     },
     {
-      "epoch": 0.7,
-      "eval_qnli-contrastive_loss": 0.5239661335945129,
-      "eval_qnli-contrastive_runtime": 15.5222,
-      "eval_qnli-contrastive_samples_per_second": 351.947,
-      "eval_qnli-contrastive_steps_per_second": 22.033,
-      "step": 3297
     },
     {
-      "epoch": 0.8,
-      "grad_norm": 20.681690216064453,
-      "learning_rate": 1.9476312452068522e-05,
-      "loss": 1.1523,
-      "step": 3768
     },
     {
-      "epoch": 0.8,
-      "eval_nli-pairs_loss": 0.6775749325752258,
-      "eval_nli-pairs_runtime": 23.2425,
-      "eval_nli-pairs_samples_per_second": 292.912,
-      "eval_nli-pairs_steps_per_second": 18.328,
-      "step": 3768
     },
     {
-      "epoch": 0.8,
-      "eval_scitail-pairs-pos_loss": 0.4816366732120514,
-      "eval_scitail-pairs-pos_runtime": 5.2694,
-      "eval_scitail-pairs-pos_samples_per_second": 247.467,
-      "eval_scitail-pairs-pos_steps_per_second": 15.562,
-      "step": 3768
     },
     {
-      "epoch": 0.8,
-      "eval_qnli-contrastive_loss": 0.4342482388019562,
-      "eval_qnli-contrastive_runtime": 15.5335,
-      "eval_qnli-contrastive_samples_per_second": 351.691,
-      "eval_qnli-contrastive_steps_per_second": 22.017,
-      "step": 3768
     },
     {
-      "epoch": 0.9,
-      "grad_norm": 12.640650749206543,
-      "learning_rate": 1.8475083492522773e-05,
-      "loss": 1.0387,
-      "step": 4239
     },
     {
-      "epoch": 0.9,
-      "eval_nli-pairs_loss": 0.6213383674621582,
-      "eval_nli-pairs_runtime": 23.1579,
-      "eval_nli-pairs_samples_per_second": 293.981,
-      "eval_nli-pairs_steps_per_second": 18.395,
-      "step": 4239
     },
     {
-      "epoch": 0.9,
-      "eval_scitail-pairs-pos_loss": 0.4603377878665924,
-      "eval_scitail-pairs-pos_runtime": 5.3009,
-      "eval_scitail-pairs-pos_samples_per_second": 245.997,
-      "eval_scitail-pairs-pos_steps_per_second": 15.469,
-      "step": 4239
     },
     {
-      "epoch": 0.9,
-      "eval_qnli-contrastive_loss": 0.3022189736366272,
-      "eval_qnli-contrastive_runtime": 15.5459,
-      "eval_qnli-contrastive_samples_per_second": 351.411,
-      "eval_qnli-contrastive_steps_per_second": 21.999,
-      "step": 4239
-    },
-    {
-      "epoch": 1.0,
-      "grad_norm": 20.227073669433594,
-      "learning_rate": 1.701008869684049e-05,
-      "loss": 1.0356,
-      "step": 4710
-    },
-    {
-      "epoch": 1.0,
-      "eval_nli-pairs_loss": 0.6488831043243408,
-      "eval_nli-pairs_runtime": 23.1759,
-      "eval_nli-pairs_samples_per_second": 293.753,
-      "eval_nli-pairs_steps_per_second": 18.381,
-      "step": 4710
-    },
-    {
-      "epoch": 1.0,
-      "eval_scitail-pairs-pos_loss": 0.5449082255363464,
-      "eval_scitail-pairs-pos_runtime": 5.3602,
-      "eval_scitail-pairs-pos_samples_per_second": 243.276,
-      "eval_scitail-pairs-pos_steps_per_second": 15.298,
-      "step": 4710
-    },
-    {
-      "epoch": 1.0,
-      "eval_qnli-contrastive_loss": 0.1294127106666565,
-      "eval_qnli-contrastive_runtime": 15.5044,
-      "eval_qnli-contrastive_samples_per_second": 352.352,
-      "eval_qnli-contrastive_steps_per_second": 22.058,
-      "step": 4710
     }
   ],
-  "logging_steps": 471,
-  "max_steps": 9420,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 2,
-  "save_steps": 4710,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {

   "best_metric": null,
   "best_model_checkpoint": null,
   "epoch": 1.0,
+  "eval_steps": 1883,
+  "global_step": 18824,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.10003187420314492,
+      "grad_norm": 39.029380798339844,
+      "learning_rate": 9.976625584360391e-07,
+      "loss": 3.6326,
+      "step": 1883
     },
     {
+      "epoch": 0.10003187420314492,
+      "eval_nli-pairs_loss": 2.6952593326568604,
+      "eval_nli-pairs_runtime": 25.731,
+      "eval_nli-pairs_samples_per_second": 264.584,
+      "eval_nli-pairs_steps_per_second": 16.556,
+      "step": 1883
     },
     {
+      "epoch": 0.10003187420314492,
+      "eval_scitail-pairs-pos_loss": 2.172569990158081,
+      "eval_scitail-pairs-pos_runtime": 6.2772,
+      "eval_scitail-pairs-pos_samples_per_second": 207.736,
+      "eval_scitail-pairs-pos_steps_per_second": 13.063,
+      "step": 1883
     },
     {
+      "epoch": 0.10003187420314492,
+      "eval_qnli-contrastive_loss": 2.702913999557495,
+      "eval_qnli-contrastive_runtime": 16.475,
+      "eval_qnli-contrastive_samples_per_second": 331.593,
+      "eval_qnli-contrastive_steps_per_second": 20.759,
+      "step": 1883
     },
     {
+      "epoch": 0.20006374840628985,
+      "grad_norm": 25.459535598754883,
+      "learning_rate": 1.9974500637484067e-06,
+      "loss": 1.7665,
+      "step": 3766
     },
     {
+      "epoch": 0.20006374840628985,
+      "eval_nli-pairs_loss": 1.2885302305221558,
+      "eval_nli-pairs_runtime": 25.4564,
+      "eval_nli-pairs_samples_per_second": 267.438,
+      "eval_nli-pairs_steps_per_second": 16.734,
+      "step": 3766
     },
     {
+      "epoch": 0.20006374840628985,
+      "eval_scitail-pairs-pos_loss": 0.9637606143951416,
+      "eval_scitail-pairs-pos_runtime": 6.1565,
+      "eval_scitail-pairs-pos_samples_per_second": 211.809,
+      "eval_scitail-pairs-pos_steps_per_second": 13.319,
+      "step": 3766
     },
     {
+      "epoch": 0.20006374840628985,
+      "eval_qnli-contrastive_loss": 1.713547945022583,
+      "eval_qnli-contrastive_runtime": 16.4307,
+      "eval_qnli-contrastive_samples_per_second": 332.487,
+      "eval_qnli-contrastive_steps_per_second": 20.815,
+      "step": 3766
     },
     {
+      "epoch": 0.3000956226094348,
+      "grad_norm": 0.8201059103012085,
+      "learning_rate": 2.9977688057798558e-06,
+      "loss": 1.1522,
+      "step": 5649
     },
     {
+      "epoch": 0.3000956226094348,
+      "eval_nli-pairs_loss": 0.9093547463417053,
+      "eval_nli-pairs_runtime": 25.1271,
+      "eval_nli-pairs_samples_per_second": 270.943,
+      "eval_nli-pairs_steps_per_second": 16.954,
+      "step": 5649
     },
     {
+      "epoch": 0.3000956226094348,
+      "eval_scitail-pairs-pos_loss": 0.7571232914924622,
+      "eval_scitail-pairs-pos_runtime": 5.9021,
+      "eval_scitail-pairs-pos_samples_per_second": 220.937,
+      "eval_scitail-pairs-pos_steps_per_second": 13.893,
+      "step": 5649
     },
     {
+      "epoch": 0.3000956226094348,
+      "eval_qnli-contrastive_loss": 0.91651451587677,
+      "eval_qnli-contrastive_runtime": 16.2309,
+      "eval_qnli-contrastive_samples_per_second": 336.579,
+      "eval_qnli-contrastive_steps_per_second": 21.071,
+      "step": 5649
     },
     {
+      "epoch": 0.4001274968125797,
+      "grad_norm": 12.970890045166016,
+      "learning_rate": 3.9975563110922225e-06,
+      "loss": 0.9533,
+      "step": 7532
     },
     {
+      "epoch": 0.4001274968125797,
+      "eval_nli-pairs_loss": 0.7290090322494507,
+      "eval_nli-pairs_runtime": 25.3154,
+      "eval_nli-pairs_samples_per_second": 268.928,
+      "eval_nli-pairs_steps_per_second": 16.828,
+      "step": 7532
     },
     {
+      "epoch": 0.4001274968125797,
+      "eval_scitail-pairs-pos_loss": 0.6498324275016785,
+      "eval_scitail-pairs-pos_runtime": 6.0764,
+      "eval_scitail-pairs-pos_samples_per_second": 214.6,
+      "eval_scitail-pairs-pos_steps_per_second": 13.495,
+      "step": 7532
     },
     {
+      "epoch": 0.4001274968125797,
+      "eval_qnli-contrastive_loss": 0.4303818643093109,
+      "eval_qnli-contrastive_runtime": 16.4463,
+      "eval_qnli-contrastive_samples_per_second": 332.172,
+      "eval_qnli-contrastive_steps_per_second": 20.795,
+      "step": 7532
     },
     {
+      "epoch": 0.5001593710157246,
+      "grad_norm": 10.865135192871094,
+      "learning_rate": 4.9973438164045905e-06,
+      "loss": 0.8013,
+      "step": 9415
     },
     {
+      "epoch": 0.5001593710157246,
+      "eval_nli-pairs_loss": 0.6431913375854492,
+      "eval_nli-pairs_runtime": 25.4337,
+      "eval_nli-pairs_samples_per_second": 267.676,
+      "eval_nli-pairs_steps_per_second": 16.749,
+      "step": 9415
     },
     {
+      "epoch": 0.5001593710157246,
+      "eval_scitail-pairs-pos_loss": 0.6006649732589722,
+      "eval_scitail-pairs-pos_runtime": 6.199,
+      "eval_scitail-pairs-pos_samples_per_second": 210.355,
+      "eval_scitail-pairs-pos_steps_per_second": 13.228,
+      "step": 9415
     },
     {
+      "epoch": 0.5001593710157246,
+      "eval_qnli-contrastive_loss": 0.25907495617866516,
+      "eval_qnli-contrastive_runtime": 16.4896,
+      "eval_qnli-contrastive_samples_per_second": 331.299,
+      "eval_qnli-contrastive_steps_per_second": 20.74,
+      "step": 9415
     },
     {
+      "epoch": 0.6001912452188696,
+      "grad_norm": 2.3549954891204834,
+      "learning_rate": 5.997662558436039e-06,
+      "loss": 0.6568,
+      "step": 11298
     },
     {
+      "epoch": 0.6001912452188696,
+      "eval_nli-pairs_loss": 0.5626155734062195,
+      "eval_nli-pairs_runtime": 25.1226,
+      "eval_nli-pairs_samples_per_second": 270.991,
+      "eval_nli-pairs_steps_per_second": 16.957,
+      "step": 11298
     },
     {
+      "epoch": 0.6001912452188696,
+      "eval_scitail-pairs-pos_loss": 0.5481033325195312,
+      "eval_scitail-pairs-pos_runtime": 6.0513,
+      "eval_scitail-pairs-pos_samples_per_second": 215.492,
+      "eval_scitail-pairs-pos_steps_per_second": 13.551,
+      "step": 11298
     },
     {
+      "epoch": 0.6001912452188696,
+      "eval_qnli-contrastive_loss": 0.13647136092185974,
+      "eval_qnli-contrastive_runtime": 16.3856,
+      "eval_qnli-contrastive_samples_per_second": 333.402,
+      "eval_qnli-contrastive_steps_per_second": 20.872,
+      "step": 11298
     },
     {
+      "epoch": 0.7002231194220144,
+      "grad_norm": 10.994942665100098,
+      "learning_rate": 6.997450063748406e-06,
+      "loss": 0.6095,
+      "step": 13181
     },
     {
+      "epoch": 0.7002231194220144,
+      "eval_nli-pairs_loss": 0.5226004719734192,
+      "eval_nli-pairs_runtime": 25.203,
+      "eval_nli-pairs_samples_per_second": 270.127,
+      "eval_nli-pairs_steps_per_second": 16.903,
+      "step": 13181
     },
     {
+      "epoch": 0.7002231194220144,
+      "eval_scitail-pairs-pos_loss": 0.5108869075775146,
+      "eval_scitail-pairs-pos_runtime": 6.1126,
+      "eval_scitail-pairs-pos_samples_per_second": 213.331,
+      "eval_scitail-pairs-pos_steps_per_second": 13.415,
+      "step": 13181
     },
     {
+      "epoch": 0.7002231194220144,
+      "eval_qnli-contrastive_loss": 0.16431590914726257,
+      "eval_qnli-contrastive_runtime": 16.4372,
+      "eval_qnli-contrastive_samples_per_second": 332.355,
+      "eval_qnli-contrastive_steps_per_second": 20.806,
+      "step": 13181
     },
     {
+      "epoch": 0.8002549936251594,
+      "grad_norm": 8.826902389526367,
+      "learning_rate": 7.997768805779857e-06,
+      "loss": 0.5694,
+      "step": 15064
     },
     {
+      "epoch": 0.8002549936251594,
+      "eval_nli-pairs_loss": 0.49213743209838867,
+      "eval_nli-pairs_runtime": 25.0892,
+      "eval_nli-pairs_samples_per_second": 271.352,
+      "eval_nli-pairs_steps_per_second": 16.979,
+      "step": 15064
     },
     {
+      "epoch": 0.8002549936251594,
+      "eval_scitail-pairs-pos_loss": 0.5194270610809326,
+      "eval_scitail-pairs-pos_runtime": 6.261,
+      "eval_scitail-pairs-pos_samples_per_second": 208.273,
+      "eval_scitail-pairs-pos_steps_per_second": 13.097,
+      "step": 15064
     },
     {
+      "epoch": 0.8002549936251594,
+      "eval_qnli-contrastive_loss": 0.05173656344413757,
+      "eval_qnli-contrastive_runtime": 16.3578,
+      "eval_qnli-contrastive_samples_per_second": 333.97,
+      "eval_qnli-contrastive_steps_per_second": 20.908,
+      "step": 15064
     },
     {
+      "epoch": 0.9002868678283042,
+      "grad_norm": 0.4369502067565918,
+      "learning_rate": 8.997556311092223e-06,
+      "loss": 0.5375,
+      "step": 16947
     },
     {
+      "epoch": 0.9002868678283042,
+      "eval_nli-pairs_loss": 0.5060996413230896,
+      "eval_nli-pairs_runtime": 25.3561,
+      "eval_nli-pairs_samples_per_second": 268.496,
+      "eval_nli-pairs_steps_per_second": 16.801,
+      "step": 16947
     },
     {
+      "epoch": 0.9002868678283042,
+      "eval_scitail-pairs-pos_loss": 0.5642966628074646,
+      "eval_scitail-pairs-pos_runtime": 6.1557,
+      "eval_scitail-pairs-pos_samples_per_second": 211.837,
+      "eval_scitail-pairs-pos_steps_per_second": 13.321,
+      "step": 16947
     },
     {
+      "epoch": 0.9002868678283042,
+      "eval_qnli-contrastive_loss": 0.046243228018283844,
+      "eval_qnli-contrastive_runtime": 16.4399,
+      "eval_qnli-contrastive_samples_per_second": 332.302,
+      "eval_qnli-contrastive_steps_per_second": 20.803,
+      "step": 16947
     }
   ],
+  "logging_steps": 1883,
+  "max_steps": 37648,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 2,
+  "save_steps": 18824,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f77cf769359ac9f8d174c1dacb0dddd22a816e2ab9373ddf22d72bd785967730
 size 5624

 version https://git-lfs.github.com/spec/v1
+oid sha256:59541de2a5be81ee914802456d2cdf4f51877f8f7384f609c9fad68c9ba147bc
 size 5624