cliang1453 commited on
Commit
11d29b2
Β·
verified Β·
1 Parent(s): b85df31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,11 +8,11 @@ pipeline_tag: text-generation
8
  ---
9
  ## Model Summary
10
 
11
- Phi-tiny-MoE is a lightweight Mixture of Experts (MoE) model with 3.8B total parameters and 1.1B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](http://link.to.slimmoe) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a larger variant, [Phi-mini-MoE](https://huggingface.co/microsoft/Phi-mini-MoE-instruct), with 7.6B total and 2.4B activated parameters.
12
 
13
 
14
  References: <br>
15
- πŸ“– [SlimMoE Paper](http:\\link.to.slimmoe) <br>
16
  πŸ“– [Phi-3 Technical Report](https://arxiv.org/abs/2404.14219) <br>
17
  πŸ“– [GRIN-MoE](https://arxiv.org/abs/2409.12136) <br>
18
 
 
8
  ---
9
  ## Model Summary
10
 
11
+ Phi-tiny-MoE is a lightweight Mixture of Experts (MoE) model with 3.8B total parameters and 1.1B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a larger variant, [Phi-mini-MoE](https://huggingface.co/microsoft/Phi-mini-MoE-instruct), with 7.6B total and 2.4B activated parameters.
12
 
13
 
14
  References: <br>
15
+ πŸ“– [SlimMoE Paper](https://arxiv.org/pdf/2506.18349) <br>
16
  πŸ“– [Phi-3 Technical Report](https://arxiv.org/abs/2404.14219) <br>
17
  πŸ“– [GRIN-MoE](https://arxiv.org/abs/2409.12136) <br>
18