allenai
/

OLMoE-1B-7B-0125-DPO

Text Generation

Inference Endpoints

Model card Files Files and versions Community

vwxyzjn commited on 8 days ago

Commit

05defd4

·

verified ·

1 Parent(s): a2b0201

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ datasets:
 ## Release Documentation
-OLMoE-1B-7B-0125-DPO January 2025 is post-trained variant of the [OLMoE-1B-7B January 2025](https://huggingface.co/allenai/OLMoE-1B-7B-0125) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix), and finally RLVR training using [this data](https://huggingface.co/datasets/allenai/RLVR-GSM).
 Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
 Check out the [OLMoE paper](https://arxiv.org/abs/2409.02060) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
@@ -27,7 +27,7 @@ These models are trained on the Dolma dataset. We are releasing all code, checkp
 The core models released in this batch include the following:
-| **Stage**           | **OLMo 2 7B**                                                                                          |
 |----------------------|----------------------------------------------------------------------------------------------------------|
 | **Base Model**       | [allenai/OLMoE-1B-7B-0125](https://huggingface.co/allenai/OLMoE-1B-7B-0125)                                |
 | **SFT**              | [allenai/OLMoE-1B-7B-0125-SFT](https://huggingface.co/allenai/OLMoE-1B-7B-0125-SFT)                |

 ## Release Documentation
+OLMoE-1B-7B-0125-DPO January 2025 is post-trained variant of the [OLMoE-1B-7B January 2025](https://huggingface.co/allenai/OLMoE-1B-7B-0125) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmoe-0125-1b-7b-preference-mix).
 Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
 Check out the [OLMoE paper](https://arxiv.org/abs/2409.02060) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
 The core models released in this batch include the following:
+| **Stage**           | **OLMoE 1B-7B**                                                                                          |
 |----------------------|----------------------------------------------------------------------------------------------------------|
 | **Base Model**       | [allenai/OLMoE-1B-7B-0125](https://huggingface.co/allenai/OLMoE-1B-7B-0125)                                |
 | **SFT**              | [allenai/OLMoE-1B-7B-0125-SFT](https://huggingface.co/allenai/OLMoE-1B-7B-0125-SFT)                |