File size: 31,282 Bytes

c0abc78

---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:164
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-l
widget:
- source_sentence: 'QUESTION #1\n'
  sentences:
  - 'An interesting point of comparison here could be the way railways rolled out
    around the world in the 1800s. Constructing these required enormous investments
    and had a massive environmental impact, and many of the lines that were built
    turned out to be unnecessary—sometimes multiple lines from different companies
    serving the exact same routes!

    The resulting bubbles contributed to several financial crashes, see Wikipedia
    for Panic of 1873, Panic of 1893, Panic of 1901 and the UK’s Railway Mania. They
    left us with a lot of useful infrastructure and a great deal of bankruptcies and
    environmental damage.

    The year of slop'
  - 'This remains astonishing to me. I thought a model with the capabilities and output
    quality of GPT-4 needed a datacenter class server with one or more $40,000+ GPUs.

    These models take up enough of my 64GB of RAM that I don’t run them often—they
    don’t leave much room for anything else.

    The fact that they run at all is a testament to the incredible training and inference
    performance gains that we’ve figured out over the past year. It turns out there
    was a lot of low-hanging fruit to be harvested in terms of model efficiency. I
    expect there’s still more to come.'
  - 'Things we learned about LLMs in 2024






















    Simon Willison’s Weblog

    Subscribe







    Things we learned about LLMs in 2024

    31st December 2024

    A lot has happened in the world of Large Language Models over the course of 2024.
    Here’s a review of things we figured out about the field in the past twelve months,
    plus my attempt at identifying key themes and pivotal moments.

    This is a sequel to my review of 2023.

    In this article:'
- source_sentence: 'QUESTION #2\n...\n\nContext:\nJust this week, the New York Times
    launched a landmark lawsuit against OpenAI and Microsoft over this issue. The
    69 page PDF is genuinely worth reading—especially the first few pages, which lay
    out the issues in a way that’s surprisingly easy to follow. The rest of the document
    includes some of the clearest explanations of what LLMs are, how they work and
    how they are built that I’ve read anywhere.\nThe legal arguments here are complex.
    I’m not a lawyer, but I don’t think this one will be easily decided. Whichever
    way it goes, I expect this case to have a profound impact on how this technology
    develops in the future.\n'', additional_kwargs={}, response_metadata={})]'
  sentences:
  - 'A lot of people are excited about AI agents—an infuriatingly vague term that
    seems to be converging on “AI systems that can go away and act on your behalf”.
    We’ve been talking about them all year, but I’ve seen few if any examples of them
    running in production, despite lots of exciting prototypes.

    I think this is because of gullibility.

    Can we solve this? Honestly, I’m beginning to suspect that you can’t fully solve
    gullibility without achieving AGI. So it may be quite a while before those agent
    dreams can really start to come true!

    Code may be the best application

    Over the course of the year, it’s become increasingly clear that writing code
    is one of the things LLMs are most capable of.'
  - 'Just this week, the New York Times launched a landmark lawsuit against OpenAI
    and Microsoft over this issue. The 69 page PDF is genuinely worth reading—especially
    the first few pages, which lay out the issues in a way that’s surprisingly easy
    to follow. The rest of the document includes some of the clearest explanations
    of what LLMs are, how they work and how they are built that I’ve read anywhere.

    The legal arguments here are complex. I’m not a lawyer, but I don’t think this
    one will be easily decided. Whichever way it goes, I expect this case to have
    a profound impact on how this technology develops in the future.'
  - 'Then there’s the rest. If you browse the Chatbot Arena leaderboard today—still
    the most useful single place to get a vibes-based evaluation of models—you’ll
    see that GPT-4-0314 has fallen to around 70th place. The 18 organizations with
    higher scoring models are Google, OpenAI, Alibaba, Anthropic, Meta, Reka AI, 01
    AI, Amazon, Cohere, DeepSeek, Nvidia, Mistral, NexusFlow, Zhipu AI, xAI, AI21
    Labs, Princeton and Tencent.

    Training a GPT-4 beating model was a huge deal in 2023. In 2024 it’s an achievement
    that isn’t even particularly notable, though I personally still celebrate any
    time a new organization joins that list.

    Some of those GPT-4 models run on my laptop'
- source_sentence: 'QUESTION #1\n'
  sentences:
  - 'The biggest innovation here is that it opens up a new way to scale a model: instead
    of improving model performance purely through additional compute at training time,
    models can now take on harder problems by spending more compute on inference.

    The sequel to o1, o3 (they skipped “o2” for European trademark reasons) was announced
    on 20th December with an impressive result against the ARC-AGI benchmark, albeit
    one that likely involved more than $1,000,000 of compute time expense!

    o3 is expected to ship in January. I doubt many people have real-world problems
    that would benefit from that level of compute expenditure—I certainly don’t!—but
    it appears to be a genuine next step in LLM architecture for taking on much harder
    problems.'
  - 'Those US export regulations on GPUs to China seem to have inspired some very
    effective training optimizations!

    The environmental impact got better

    A welcome result of the increased efficiency of the models—both the hosted ones
    and the ones I can run locally—is that the energy usage and environmental impact
    of running a prompt has dropped enormously over the past couple of years.

    OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days.
    I have it on good authority that neither Google Gemini nor Amazon Nova (two of
    the least expensive model providers) are running prompts at a loss.'
  - 'OpenAI made GPT-4o free for all users in May, and Claude 3.5 Sonnet was freely
    available from its launch in June. This was a momentus change, because for the
    previous year free users had mostly been restricted to GPT-3.5 level models, meaning
    new users got a very inaccurate mental model of what a capable LLM could actually
    do.

    That era appears to have ended, likely permanently, with OpenAI’s launch of ChatGPT
    Pro. This $200/month subscription service is the only way to access their most
    capable model, o1 Pro.

    Since the trick behind the o1 series (and the future models it will undoubtedly
    inspire) is to expend more compute time to get better results, I don’t think those
    days of free access to the best available models are likely to return.'
- source_sentence: 'QUESTION #1\n'
  sentences:
  - 'The May 13th announcement of GPT-4o included a demo of a brand new voice mode,
    where the true multi-modal GPT-4o (the o is for “omni”) model could accept audio
    input and output incredibly realistic sounding speech without needing separate
    TTS or STT models.

    The demo also sounded conspicuously similar to Scarlett Johansson... and after
    she complained the voice from the demo, Skye, never made it to a production product.

    The delay in releasing the new voice mode after the initial demo caused quite
    a lot of confusion. I wrote about that in ChatGPT in “4o” mode is not running
    the new features yet.'
  - 'Against this photo of butterflies at the California Academy of Sciences:



    A shallow dish, likely a hummingbird or butterfly feeder, is red.  Pieces of orange
    slices of fruit are visible inside the dish.

    Two butterflies are positioned in the feeder, one is a dark brown/black butterfly
    with white/cream-colored markings.  The other is a large, brown butterfly with
    patterns of lighter brown, beige, and black markings, including prominent eye
    spots. The larger brown butterfly appears to be feeding on the fruit.'
  - 'The year of slop

    Synthetic training data works great

    LLMs somehow got even harder to use

    Knowledge is incredibly unevenly distributed

    LLMs need better criticism

    Everything tagged “llms” on my blog in 2024'
- source_sentence: 'QUESTION #1\n'
  sentences:
  - 'Terminology aside, I remain skeptical as to their utility based, once again,
    on the challenge of gullibility. LLMs believe anything you tell them. Any systems
    that attempts to make meaningful decisions on your behalf will run into the same
    roadblock: how good is a travel agent, or a digital assistant, or even a research
    tool if it can’t distinguish truth from fiction?

    Just the other day Google Search was caught serving up an entirely fake description
    of the non-existant movie “Encanto 2”. It turned out to be summarizing an imagined
    movie listing from a fan fiction wiki.'
  - 'Your browser does not support the audio element.


    OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
    accepts audio input, and the Google Gemini apps can speak in a similar way to
    ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
    meant to roll out in Q1 of 2025.

    Google’s NotebookLM, released in September, took audio output to a new level by
    producing spookily realistic conversations between two “podcast hosts” about anything
    you fed into their tool. They later added custom instructions, so naturally I
    turned them into pelicans:



    Your browser does not support the audio element.'
  - 'Then in February, Meta released Llama. And a few weeks later in March, Georgi
    Gerganov released code that got it working on a MacBook.

    I wrote about how Large language models are having their Stable Diffusion moment,
    and with hindsight that was a very good call!

    This unleashed a whirlwind of innovation, which was accelerated further in July
    when Meta released Llama 2—an improved version which, crucially, included permission
    for commercial use.

    Today there are literally thousands of LLMs that can be run locally, on all manner
    of different devices.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: Unknown
      type: unknown
    metrics:
    - type: cosine_accuracy@1
      value: 0.56
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.64
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.72
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.92
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.56
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.21333333333333332
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.14400000000000002
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.09200000000000001
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.56
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.64
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.72
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.92
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.7017423735235339
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.63715873015873
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.6441284271284272
      name: Cosine Map@100
---

# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dataera2013/legal-ft-2")
# Run inference
sentences = [
    'QUESTION #1\\n',
    'Your browser does not support the audio element.\n\nOpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also accepts audio input, and the Google Gemini apps can speak in a similar way to ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s meant to roll out in Q1 of 2025.\nGoogle’s NotebookLM, released in September, took audio output to a new level by producing spookily realistic conversations between two “podcast hosts” about anything you fed into their tool. They later added custom instructions, so naturally I turned them into pelicans:\n\n\nYour browser does not support the audio element.',
    'Then in February, Meta released Llama. And a few weeks later in March, Georgi Gerganov released code that got it working on a MacBook.\nI wrote about how Large language models are having their Stable Diffusion moment, and with hindsight that was a very good call!\nThis unleashed a whirlwind of innovation, which was accelerated further in July when Meta released Llama 2—an improved version which, crucially, included permission for commercial use.\nToday there are literally thousands of LLMs that can be run locally, on all manner of different devices.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval

* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.56       |
| cosine_accuracy@3   | 0.64       |
| cosine_accuracy@5   | 0.72       |
| cosine_accuracy@10  | 0.92       |
| cosine_precision@1  | 0.56       |
| cosine_precision@3  | 0.2133     |
| cosine_precision@5  | 0.144      |
| cosine_precision@10 | 0.092      |
| cosine_recall@1     | 0.56       |
| cosine_recall@3     | 0.64       |
| cosine_recall@5     | 0.72       |
| cosine_recall@10    | 0.92       |
| **cosine_ndcg@10**  | **0.7017** |
| cosine_mrr@10       | 0.6372     |
| cosine_map@100      | 0.6441     |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 164 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 164 samples:
  |         | sentence_0                                                                         | sentence_1                                                                           |
  |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
  | type    | string                                                                             | string                                                                               |
  | details | <ul><li>min: 4 tokens</li><li>mean: 72.05 tokens</li><li>max: 228 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.85 tokens</li><li>max: 214 tokens</li></ul> |
* Samples:
  | sentence_0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>QUESTION #1\n</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <code>Stuff we figured out about AI in 2023<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>Simon Willison’s Weblog<br>Subscribe<br><br><br><br><br><br><br>Stuff we figured out about AI in 2023<br>31st December 2023<br>2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.<br>Here’s my attempt to round up the highlights in one place!</code> |
  | <code>QUESTION #2\n...\n\nContext:\nStuff we figured out about AI in 2023\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSimon Willison’s Weblog\nSubscribe\n\n\n\n\n\n\nStuff we figured out about AI in 2023\n31st December 2023\n2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.\nHere’s my attempt to round up the highlights in one place!\n', additional_kwargs={}, response_metadata={})]</code> | <code>Stuff we figured out about AI in 2023<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>Simon Willison’s Weblog<br>Subscribe<br><br><br><br><br><br><br>Stuff we figured out about AI in 2023<br>31st December 2023<br>2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.<br>Here’s my attempt to round up the highlights in one place!</code> |
  | <code>QUESTION #1\n</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <code>Large Language Models<br>They’re actually quite easy to build<br>You can run LLMs on your own devices<br>Hobbyists can build their own fine-tuned models<br>We don’t yet know how to build GPT-4<br>Vibes Based Development<br>LLMs are really smart, and also really, really dumb<br>Gullibility is the biggest unsolved problem<br>Code may be the best application<br>The ethics of this space remain diabolically complex<br>My blog in 2023</code>                                                                                                                           |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256,
          128,
          64
      ],
      "matryoshka_weights": [
          1,
          1,
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `num_train_epochs`: 10
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 10
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Training Logs
| Epoch  | Step | cosine_ndcg@10 |
|:------:|:----:|:--------------:|
| 1.0    | 17   | 0.7017         |
| 2.0    | 34   | 0.7017         |
| 2.9412 | 50   | 0.7017         |
| 3.0    | 51   | 0.7017         |
| 4.0    | 68   | 0.7017         |
| 5.0    | 85   | 0.7017         |
| 5.8824 | 100  | 0.7017         |
| 6.0    | 102  | 0.7017         |
| 7.0    | 119  | 0.7017         |
| 8.0    | 136  | 0.7017         |
| 8.8235 | 150  | 0.7017         |
| 9.0    | 153  | 0.7017         |
| 10.0   | 170  | 0.7017         |


### Framework Versions
- Python: 3.13.1
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->