lewtun HF staff commited on
Commit
7e291e2
·
verified ·
1 Parent(s): 90dc518

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -19,6 +19,9 @@ model-index:
19
 
20
  Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A35B is the latest model in the series, and is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) that was trained using a novel alignment algorithm called [Odds Ratio Preference Optimization (ORPO)](https://huggingface.co/papers/2403.07691) with **7k instances** for **1.3 hours** on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A35B, we used the [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.
21
 
 
 
 
22
  ## Model Details
23
 
24
  ### Model Description
 
19
 
20
  Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A35B is the latest model in the series, and is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) that was trained using a novel alignment algorithm called [Odds Ratio Preference Optimization (ORPO)](https://huggingface.co/papers/2403.07691) with **7k instances** for **1.3 hours** on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A35B, we used the [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.
21
 
22
+ > [!NOTE]
23
+ > This model was trained collaboratively between Argilla, KAIST, and Hugging Face
24
+
25
  ## Model Details
26
 
27
  ### Model Description