Upload folder using huggingface_hub

by soldni - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+83

-72

Files changed (32) hide show

README.md +52 -41
config.json +1 -1
generation_config.json +1 -1
model-00001-of-00029.safetensors +1 -1
model-00002-of-00029.safetensors +1 -1
model-00003-of-00029.safetensors +1 -1
model-00004-of-00029.safetensors +1 -1
model-00005-of-00029.safetensors +1 -1
model-00006-of-00029.safetensors +1 -1
model-00007-of-00029.safetensors +1 -1
model-00008-of-00029.safetensors +1 -1
model-00009-of-00029.safetensors +1 -1
model-00010-of-00029.safetensors +1 -1
model-00011-of-00029.safetensors +1 -1
model-00012-of-00029.safetensors +1 -1
model-00013-of-00029.safetensors +1 -1
model-00014-of-00029.safetensors +1 -1
model-00015-of-00029.safetensors +1 -1
model-00016-of-00029.safetensors +1 -1
model-00017-of-00029.safetensors +1 -1
model-00018-of-00029.safetensors +1 -1
model-00019-of-00029.safetensors +1 -1
model-00020-of-00029.safetensors +1 -1
model-00021-of-00029.safetensors +1 -1
model-00022-of-00029.safetensors +1 -1
model-00023-of-00029.safetensors +1 -1
model-00024-of-00029.safetensors +1 -1
model-00025-of-00029.safetensors +1 -1
model-00026-of-00029.safetensors +1 -1
model-00027-of-00029.safetensors +1 -1
model-00028-of-00029.safetensors +1 -1
model-00029-of-00029.safetensors +1 -1

README.md CHANGED Viewed

@@ -11,10 +11,12 @@ language:
 # Model Card for OLMo 2 32B
-We introduce OLMo 2 32B, to the family of 7B and 13B models featuring a 9-point increase in MMLU, among other evaluation improvements, compared to the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model. These gains come from training on [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) and Dolmino-mix-0325 (releasing soon) datasets and staged training approach.
-OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
-These models are trained on the Dolma dataset. We have released all code, checkpoints, logs, and associated training details on [GitHub](https://github.com/allenai/OLMo-core).
 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 |------|--------|---------|-------------|-----------------|----------------|
@@ -24,7 +26,7 @@ These models are trained on the Dolma dataset. We have released all code, checkp
 The core models released in this batch include the following:
-| **Stage**           | **OLMo 2 32B**                                                                                          | **OLMo 2 13B**                                                                                         | **OLMo 2 7B**
 |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
 | **Base Model**       | [allenai/OLMo-2-0325-32B](https://huggingface.co/allenai/OLMo-2-0325-32B)                                | [allenai/OLMo-2-1124-13B](https://huggingface.co/allenai/OLMo-2-1124-13B)                             | [allenai/OLMo-2-1124-7B](https://huggingface.co/allenai/OLMo-2-1124-7B)                                |
 | **SFT**              | [allenai/OLMo-2-0325-32B-SFT](https://huggingface.co/allenai/OLMo-2-0325-32B-SFT)                | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT)              | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT)                |
@@ -34,11 +36,13 @@ The core models released in this batch include the following:
 ## Installation
-OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using:
 ```bash
-pip install --upgrade git+https://github.com/huggingface/transformers.git
 ```
 ## Inference
 You can use OLMo with the standard HuggingFace transformers library:
@@ -58,8 +62,8 @@ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
 For faster performance, you can quantize the model using the following method:
 ```python
-AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0325-32B",
-    torch_dtype=torch.float16,
     load_in_8bit=True)  # Requires bitsandbytes
 ```
 The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
@@ -81,7 +85,6 @@ from huggingface_hub import list_repo_refs
 out = list_repo_refs("allenai/OLMo-2-0325-32B")
 branches = [b.name for b in out.branches]
 ```
-Note: vLLM for OLMo2 32B does not correctly handle attention when the number of heads differs from the number of KV heads (i.e., when using Grouped-Query Attention (GQA) or Multi-Query Attention (MQA) instead of Multi-Head Attention (MHA)). Specifically, it incorrectly splits QKV into equal chunks rather than based on the actual sizes of Q, K, and V. vLLM hasn't released a version with the fix yet ([Issue](https://github.com/vllm-project/vllm/pull/13687)).
 ### Fine-tuning
 Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
@@ -111,7 +114,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-
 ### Model Sources
 - **Project Page:** https://allenai.org/olmo
-- **Repositories:**
     - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core
     - Evaluation code: https://github.com/allenai/OLMo-Eval
     - Further fine-tuning code: https://github.com/allenai/open-instruct
@@ -123,70 +126,78 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-
 ## Evaluation
 Core model results for OLMo 2 32B are found below.
-| Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMLUPro | TriviaQA |
-|-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
-| *Open weights models:* |
-| Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
 | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
-| Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
 | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
-| Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
-| Gemma-2-9B | 4.4·10²³ | 67.8 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 70.1 | 42 | 81.8 |
-| Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
-| *Partially open models:* |
-| StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
 | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
-| *Fully open models:* |
-| Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
-| OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
-| MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
-| OLMo-0424-7B | 0.9·10²³ | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 |
-| DCLM-7B | 1.0·10²³ | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 |
-| **OLMo-2-1124-7B** | 1.8·10²³ | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31 | 78 |
-| **OLMo-2-1124-13B** | 4.6·10²³ | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 |
 ## Model Details
 ### Pretraining
 |  | **OLMo 2 32B** | **OLMo 2 13B** | **OLMo 2 7B** |
 |-------------------|------------|------------|------------|
-| Pretraining Stage 1 | 6 trillion tokens<br>(1 epoch) | 5 trillion tokens<br>(1.2 epochs) | 4 trillion tokens<br>(1 epoch) |
 | Pretraining Stage 2 | 100B tokens (2 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 50B tokens (3 runs)<br>*merged* |
 | Post-training | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) |
 #### Stage 1: Initial Pretraining
-- Dataset: [OLMo-Mix-0325](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1) (3.9T tokens)
-- Coverage: 90%+ of total pretraining budget
-- 32B Model: ~1 epoch
 #### Stage 2: Fine-tuning
-- Dataset: Dolmino-Mix-0325 (releasing soon)
-- Three training mixes:
-  - 100B tokens
   - 100B tokens
   - 300B tokens
-- Mix composition: 50% high-quality data + academic/Q&A/instruction/math content
 #### Model Merging
-- 32B Model: 2 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
 ## Bias, Risks, and Limitations
-Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
 ## Citation
 ```
 @misc{olmo20242olmo2furious,
-      title={2 OLMo 2 Furious},
       author={Team OLMo and Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Michal Guerquin and Hamish Ivison and Pang Wei Koh and Jiacheng Liu and Saumya Malik and William Merrill and Lester James V. Miranda and Jacob Morrison and Tyler Murray and Crystal Nam and Valentina Pyatkin and Aman Rangapur and Michael Schmitz and Sam Skjonsberg and David Wadden and Christopher Wilhelm and Michael Wilson and Luke Zettlemoyer and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
       year={2024},
       eprint={2501.00656},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2501.00656},
 }
 ```
 ## Model Card Contact
-For errors in this model card, contact `[email protected]`.

 # Model Card for OLMo 2 32B
+We introduce OLMo 2 32B, the largest model in the OLMo 2 family.
+OLMo 2 was pre-trained on [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124)
+and uses [Dolmino-mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124) for mid-training.
+OLMo 2 is the latest in a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
+We have released all code, checkpoints, logs, and associated training details on [GitHub](https://github.com/allenai/OLMo-core).
 | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
 |------|--------|---------|-------------|-----------------|----------------|
 The core models released in this batch include the following:
+| **Stage**           | **OLMo 2 32B**                                                                                          | **OLMo 2 13B**                                                                                         | **OLMo 2 7B**
 |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
 | **Base Model**       | [allenai/OLMo-2-0325-32B](https://huggingface.co/allenai/OLMo-2-0325-32B)                                | [allenai/OLMo-2-1124-13B](https://huggingface.co/allenai/OLMo-2-1124-13B)                             | [allenai/OLMo-2-1124-7B](https://huggingface.co/allenai/OLMo-2-1124-7B)                                |
 | **SFT**              | [allenai/OLMo-2-0325-32B-SFT](https://huggingface.co/allenai/OLMo-2-0325-32B-SFT)                | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT)              | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT)                |
 ## Installation
+OLMo 2 32B is supported in transformers v4.48 or higher:
 ```bash
+pip install transformers>=4.48
 ```
+If using vLLM, you will need to install from the main branch until v0.7.4 is released. Please
 ## Inference
 You can use OLMo with the standard HuggingFace transformers library:
 For faster performance, you can quantize the model using the following method:
 ```python
+AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0325-32B",
+    torch_dtype=torch.float16,
     load_in_8bit=True)  # Requires bitsandbytes
 ```
 The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
 out = list_repo_refs("allenai/OLMo-2-0325-32B")
 branches = [b.name for b in out.branches]
 ```
 ### Fine-tuning
 Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
 ### Model Sources
 - **Project Page:** https://allenai.org/olmo
+- **Repositories:**
     - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core
     - Evaluation code: https://github.com/allenai/OLMo-Eval
     - Further fine-tuning code: https://github.com/allenai/open-instruct
 ## Evaluation
 Core model results for OLMo 2 32B are found below.
+| Model | Training FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMLUPro | TriviaQA |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| **Open weights models** | | | | | | | | | | | | |
+| Llama-2-13B | 1.6 · 10^23 | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
 | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
+| Llama-3.1-8B | 7.2 · 10^23 | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
 | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
+| Qwen-2.5-7B | 8.2 · 10^23 | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
+| Gemma-2-9B | 4.4 · 10^23 | 67.8 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 70.1 | 42 | 81.8 |
+| Mistral-Small-24B | n/a | 75.2 | 93.3 | 91.3 | 77.8 | 80.7 | 74.4 | 42.3 | 69.1 | 79.7 | 54.2 | 88.8 |
+| Gemma-2-27B | 2.1 · 10^24 | 71.3 | 90.7 | 88.4 | 74.5 | 75.7 | 70.1 | 44.7 | 61.5 | 75.7 | 44.7 | 87.4 |
+| Qwen-2.5-14B | 1.6 · 10^24 | 72.2 | 94.0 | 94.0 | 80.0 | 79.3 | 51.5 | 37.3 | 71.0 | 83.4 | 52.8 | 79.1 |
+| Qwen-2.5-32B | 3.5 · 10^24 | 74.9 | 95.6 | 96.0 | 84.0 | 83.1 | 53.1 | 37.0 | 78.0 | 83.3 | 59.0 | 79.9 |
+| **Partially open models** | | | | | | | | | | | | |
+| StableLM-2-12B | 2.9 · 10^23 | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
 | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
+| **Fully open models** | | | | | | | | | | | | |
+| Amber-7B | 0.5 · 10^23 | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
+| OLMo-7B | 1.0 · 10^23 | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
+| MAP-Neo-7B | 2.1 · 10^23 | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
+| OLMo-0424-7B | 0.9 · 10^23 | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 |
+| DCLM-7B | 1.0 · 10^23 | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 |
+| OLMo-2-1124-7B | 1.8 · 10^23 | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31.0 | 78 |
+| OLMo-2-1124-13B | 4.6 · 10^23 | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 |
+| **OLMo-2-0325-32B** | 1.3 · 10^24 | 72.9 | 90.4 | 89.7 | 78.7 | 74.9 | 74.3 | 50.2 | 61.0 | 78.8 | 43.3 | 88.0 |
+- *Columns ARC/C through NQ represent metrics tracked during OLMo 2 development.*
+- *Columns AGIEval through TriviaQA represent unseen evals.*
 ## Model Details
 ### Pretraining
 |  | **OLMo 2 32B** | **OLMo 2 13B** | **OLMo 2 7B** |
 |-------------------|------------|------------|------------|
+| Pretraining Stage 1 | 6 trillion tokens<br>(1.5 epoch) | 5 trillion tokens<br>(1.2 epochs) | 4 trillion tokens<br>(1 epoch) |
 | Pretraining Stage 2 | 100B tokens (2 runs)<br>300B tokens (1 run)<br>*merged* | 100B tokens (3 runs)<br>300B tokens (1 run)<br>*merged* | 50B tokens (3 runs)<br>*merged* |
 | Post-training | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix)) | SFT + DPO + PPO<br>([preference mix](https://huggingface.co/datasets/allenai/olmo-2-1124-7b-preference-mix)) |
 #### Stage 1: Initial Pretraining
+- Dataset: [OLMo-mix-1124](https://huggingface.co/datasets/allenai/olmo-mix-1124) (3.9T tokens)
+- Coverage: 95%+ of total pretraining budget
+- 32B Model: ~1.5 epoch
 #### Stage 2: Fine-tuning
+- Dataset: Dolmino-Mix-1124
+- Two training mixes:
   - 100B tokens
   - 300B tokens
+- Mix composition: 50% high-quality web data + academic/Q&A/instruction/math content
 #### Model Merging
+- 32B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
 ## Bias, Risks, and Limitations
+Like any base or fine-tuned language model, AI can  be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
 ## Citation
 ```
 @misc{olmo20242olmo2furious,
+      title={{2 OLMo 2 Furious}},
       author={Team OLMo and Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Michal Guerquin and Hamish Ivison and Pang Wei Koh and Jiacheng Liu and Saumya Malik and William Merrill and Lester James V. Miranda and Jacob Morrison and Tyler Murray and Crystal Nam and Valentina Pyatkin and Aman Rangapur and Michael Schmitz and Sam Skjonsberg and David Wadden and Christopher Wilhelm and Michael Wilson and Luke Zettlemoyer and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
       year={2024},
       eprint={2501.00656},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2501.00656},
 }
 ```
 ## Model Card Contact
+For errors in this model card, contact `[email protected]`.

config.json CHANGED Viewed

@@ -21,7 +21,7 @@
   "rope_theta": 500000,
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
-  "transformers_version": "4.47.1",
   "use_cache": true,
   "vocab_size": 100352
 }

   "rope_theta": 500000,
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
+  "transformers_version": "4.49.0",
   "use_cache": true,
   "vocab_size": 100352
 }

generation_config.json CHANGED Viewed

@@ -3,5 +3,5 @@
   "bos_token_id": 100257,
   "eos_token_id": 100257,
   "pad_token_id": 100277,
-  "transformers_version": "4.47.1"
 }

   "bos_token_id": 100257,
   "eos_token_id": 100257,
   "pad_token_id": 100277,
+  "transformers_version": "4.49.0"
 }

model-00001-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4016708ba7f6c242242a9aebebe41d5283d8ba3c14074162fae518c072d01c5b
 size 4823541920

 version https://git-lfs.github.com/spec/v1
+oid sha256:cb1a14b6f31c5ea5aebd91738cfcb67a8515b97474c51e866bf39785bc02be8f
 size 4823541920

model-00002-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:14b9112f2a67068a2f324725159248faf5591e30c4523cdac9c3337e3dd4e844
 size 4467067512

 version https://git-lfs.github.com/spec/v1
+oid sha256:6441be4d3bb8e508cdee671bb1d64da498932ec5490ab815c26183f2448019df
 size 4467067512

model-00003-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cadc6243821f18f0dd56f7e88c683e2cea91ef3c9e93e97fd04816879b2e3595
 size 4718792208

 version https://git-lfs.github.com/spec/v1
+oid sha256:e368dad6a1531e2c8e72ab3937f1fe4955471b2b288f8064ab3b1e9af144985a
 size 4718792208

model-00004-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:53d263da5193292bb14993ef4285759bcd84745accf1f922e604b59c796b60a5
 size 4467067512

 version https://git-lfs.github.com/spec/v1
+oid sha256:90ea66a068affcf737ab812d04fc12d9bcdbbb21d78a8df59b29f5d43504b5f9
 size 4467067512

model-00005-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e6f4bb07eda80d1c1c44b291d7cfec80fff2296016a8d20caeeebc8cb49d3627
 size 4467067520

 version https://git-lfs.github.com/spec/v1
+oid sha256:435fd760305d180ffd33d8dcc25758c827c4a2d6ae317bd7a9ed278c5b659c32
 size 4467067520

model-00006-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:83674d57a72a6368669d6b212f10d8815836a32e8ac7033d61ea225c04e06599
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:0f93caec1f5b99ab8e6a3357376ca48334d92baf3e93b81af0618aadf13e25e8
 size 4718792240

model-00007-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b8b1694c6c059f740fea551bf3cb479a0c32d95b1fe189ab00a942e98a0f9347
 size 4467067536

 version https://git-lfs.github.com/spec/v1
+oid sha256:bd526a0e9eebfa8ae7db712fb08b158f6bf9204a86655a84529b57b32890a87e
 size 4467067536

model-00008-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b5d5ff62c739f06c90ae5a847dbc5c151806c375ccc798c64d31c22f43824551
 size 4467067528

 version https://git-lfs.github.com/spec/v1
+oid sha256:4a0d8f0863b022ebdaa00cd6669a806e1f6d3ad1eab3b61cd6f6cc1e276dfa44
 size 4467067528

model-00009-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:12f07f9063abdb6b4c9ade00e9ce92beca849f2e9baa0eee31a1ba055a8b1303
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:791d96e11698d6defbed904f9bb7b1b61ec8d761934b6ee55576eb7c9850dfbe
 size 4718792240

model-00010-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:feda7deedc9830bd716c9d8b0c8db54dcff0e38dd97a7ea062e9d6e5e3eb709f
 size 4467067536

 version https://git-lfs.github.com/spec/v1
+oid sha256:3d04bb4cc9dd226744d175d940eee7800bb9e0c835021636d36e9210ced1a606
 size 4467067536

model-00011-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:07d96f74b1f687273efc58a522bd60005fb9a04d24d208cfae86e776a0ad2df9
 size 4467067528

 version https://git-lfs.github.com/spec/v1
+oid sha256:ecdbdbf35515f5df5cb386ffeb313f346fe472a006d6d557f4f1427c9b19d106
 size 4467067528

model-00012-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:18fbffb435e06cca4acb3c2fdd6c3655b4f6816ddaef40f237dcee27b515b47a
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:dbe70de1b9c018870bfcda170fe62f7ca85785edec9634d017240be6be3efb7c
 size 4718792240

model-00013-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3d920239409aa1bf00ec850c3e4b3ca487bd182a79f5c760ccccec342113ea99
 size 4467067536

 version https://git-lfs.github.com/spec/v1
+oid sha256:dcb9ecafcad8f50442299b25eb7bd43453e91d6e6dd6711f80c5f77f9e85c9a1
 size 4467067536

model-00014-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a0b6da470f31806b4c07d8706a1b7dc315dcebabf01b21b0e4450f9dffcbd43b
 size 4467067528

 version https://git-lfs.github.com/spec/v1
+oid sha256:274e615c668079e73b14588de6d97fedd372bb295e3d07503ceeb21345d72bfa
 size 4467067528

model-00015-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5323e20ff2078104fd73e4f9e18dae24c2b8b22552be786da50555a29dbf004b
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:4330dc8d27dc0f7c7f32f0a767eec88f4cd96caf0905f939209444cfb3ed80d3
 size 4718792240

model-00016-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:810df0b075332a9394efc0386a163617729b46a39a3fa4ac06e6fa781c5dc26a
 size 4467067536

 version https://git-lfs.github.com/spec/v1
+oid sha256:70071dca4a22c352aaea3f9dc0a843a8d29c9d1beb4dbddcab006b0ef54580ae
 size 4467067536

model-00017-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:51a933b6123d526772dfffe94e82fc18f95be70cec12491dfd5e05edec0a7fc0
 size 4467067528

 version https://git-lfs.github.com/spec/v1
+oid sha256:ae6ae74a27d664ecb66a1192f8987f2d90c1c40d9ad479ad278da44fcd10230f
 size 4467067528

model-00018-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d4314c4f80dfb089ab206350d1c20ae64cc3e2fc9e9c72df04a2548c4ab9c9ee
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:fb6c93823a17ec8ca736288124e24d3bf04e4b13f7cb839c8e5f74b38110dabc
 size 4718792240

model-00019-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b1158408201e89be236c1722b9d8495b81c68d34d02fb8c15ae021d5df028906
 size 4467067536

 version https://git-lfs.github.com/spec/v1
+oid sha256:75a10e5fcee973e6a885d56bf3acf069b6106a93989d4e164289adec11e55ab6
 size 4467067536

model-00020-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2bc752f4a5ae1ed48cfc4724a5fc153ce148a58e57eabf5a7907ff284863e4cb
 size 4467067528

 version https://git-lfs.github.com/spec/v1
+oid sha256:ec5ba2b627c55bca604450133ddc9b2f1f4025940ac02d3dd1434ff8dc45d1b4
 size 4467067528

model-00021-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a4a8815bb6eaba80dd68c892b6d56c09fbb2f79ff1a9027819d6fce94651772b
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:af0b2e37c7d92e5d861395768be33c37cc8336c144fd2a282b30aed502c738ed
 size 4718792240

model-00022-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:20326bc353f4cec48869722c5ce7c315c109d1c4a86a637fadd41e2a266bf975
 size 4467067536

 version https://git-lfs.github.com/spec/v1
+oid sha256:6090acbe1d24e65e74dd6dd980d65b63f774d8b80cb0f68378efff7cf25cd70d
 size 4467067536

model-00023-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dc3a7e6c2958d4181633e07266fd8fadb5dc14c3d1b77bc742bf8cfcaef11049
 size 4467067528

 version https://git-lfs.github.com/spec/v1
+oid sha256:6112636230e1ade0c039c651a54e5cb1fc83aa49c36f87846bb6b7226a290c27
 size 4467067528

model-00024-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ff290550b45c8cbd02bd7fcff367bf73e293793122e0568d930b2a04bb9c0045
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:c46c9b7725fb7cdc9415e18bc593e3741c3a53d84a2435afe2be1dca3bd4b21b
 size 4718792240

model-00025-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5e49c2fe381adaa34df4fa9b1486cb2e5b0ce792ff903361fc5bdb654dce4cfe
 size 4467067536

 version https://git-lfs.github.com/spec/v1
+oid sha256:1581a8bacbb610fcd0df2b34d71e145050742bd83e564f15c7a9fe614e8c3228
 size 4467067536

model-00026-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8b7eaff0b2741a59878afe93196e4ae01a14e30d90b491f941881d7d2dcf0fcd
 size 4467067528

 version https://git-lfs.github.com/spec/v1
+oid sha256:ee3ad8ad412cfab585a290ed71cfabe8f8fe3314d45f0038128a6f4f0a547776
 size 4467067528

model-00027-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b9b732fcd7e5744e7f3870fba414d355f989793c7533f33b2cf0e314b5675b2e
 size 4718792240

 version https://git-lfs.github.com/spec/v1
+oid sha256:52822419e1979666657bd72c8fa213ff47e5d0e4f182007caa96fff6e12f7fcc
 size 4718792240

model-00028-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1dcd0c67bca704b0f507dbdb3d96413137d72e5c8f822aa0a1a654d4536175bd
 size 3649173440

 version https://git-lfs.github.com/spec/v1
+oid sha256:71464699e194d19280e74cd0d458680338ef69cfc4ff67f5fdd49ce41ec6c72e
 size 3649173440

model-00029-of-00029.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dd7b9e2f478e1141011ee0c2c4dac205a0812f7a72ee4b4d22c688ccd8b775e9
 size 2055209088

 version https://git-lfs.github.com/spec/v1
+oid sha256:32bd36e52e4635532e9bf7174716abe5540e1227b0aa67bc357d2b98ca754193
 size 2055209088