TinyDeepSeek-0.5B-checkpoints / README.md

Update README.md

59556f2 verified 4 months ago

4.34 kB

	---
	license: apache-2.0
	---
	# TinyDeepSeek: Reproduction of DeepSeek-R1 and Beyond

	<p align="center">
	📃 <a href="" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/TinyDeepSeek-0.5B-base" target="_blank">TinyDeepSeek-0.5B-base</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/TinyDeepSeek-3.3B-base" target="_blank">TinyDeepSeek-3.3B-base</a>
	<br> 🤗 <a href="https://huggingface.co/FreedomIntelligence/TinyDeepSeek-3.3B-checkpoints" target="_blank"> TinyDeepSeek-3.3B-checkpoints</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/TinyDeepSeek-0.5B-checkpoints" target="_blank">TinyDeepSeek-0.5B-checkpoints</a>

	</p>

	Innovation and Open source are the best tributes and reproductions of DeepSeek.

	The open-source ethos, rooted in technological equity, upholds two key principles: free access for all developers and the opportunity for technical contributions. While DeepSeek exemplifies the first principle, the second principle is hindered by restrictive training strategies, unclear data sources, and the high costs of model training. These constraints limit the open-source community's capacity to contribute and impede technological progress.

	To overcome these challenges, we launched a comprehensive reproduction project of DeepSeek. This initiative involves training models from scratch, replicating DeepSeek's architecture and algorithms. We will fully open-source the training code, datasets, and models, offering code framework, reference solution, and base models for low-cost continual exploration.



	## 🌈 Update

	* [2025.03.11] TinyDeepSeek repo is published！🎉




	## Reproduction of DeepSeek-R1

	### Architecture

	<details><summary>Click to expand</summary>

	![arch](assets/arch.png)

	As shown in above Figure, the DeepSeek technical report introduces three architectural designs:

	- A. Multi-Head Latent Attention (MLA)

	- B. Load Balancing Strategy without Auxiliary Loss: Please refer [code](https://github.com/FreedomIntelligence/TinyDeepSeek/blob/main/src/modeling/tinydeepseek/modeling_tinydeepseek.py#L550) for implementation.

	- C. Multi-Token Prediction: Please refer [code](https://github.com/FreedomIntelligence/TinyDeepSeek/blob/main/src/modeling/tinydeepseek/modeling_tinydeepseek.py#L1875) for implementation.

	Our Detailed architectural parameter are shown in Table below.

	![arch](assets/iarch.png)

	</details>

	### Data Construction

	<details><summary>Click to expand</summary>

	![arch](assets/data.png)

	- Pretrain:
	- Meta & Labeled Data: TBD
	- Process Code: Please refer to [Link](/pretrain_data/Readme.md) for implementation.
	- SFT:
	- [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
	- [allenai/tulu-3-sft-olmo-2-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture)
	- RL:
	- [SynthLabsAI/Big-Math-RL-Verified](https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified)

	</details>

	### Model Training

	```
	bash examples/pretrainStage1.sh
	bash examples/pretrainStage2.sh
	bash examples/sft.sh

	bash examples/rl.sh (TBD)
	```

	### Results

	TBD

	## Beyond Reproduction

	### Scale Up RL to Pretrain

	> RWO: Reward Weighted Optimization

	For the training, please include the include the following flags in the training command.

	- For Pretrain please prepare data item with key 'text_evaluation':{"knowledge":4, "reasoning":3, ...}.
	- For SFT, please provide data file 'General.json' and reward file 'General_reward.json' in the same directory.

	```
	--reward_weighted_optimization True \
	--remove_unused_columns False \
	```

	## 📃 To do

	- [ ] Release Evaluation Results
	- [ ] Release RL Training Code and Model
	- [ ] Release Pretrain Data quality annotation label
	- [ ] Release TinyDeepSeek Technical Report


	## Acknowledgment

	- [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1)


	## Citation
	Please use the following citation if you intend to use our dataset for training or evaluation:

	```
	@misc{tinydeepseek,
	title={TinyDeepSeek},
	author={FreedomIntelligence Team},
	year = {2025},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/FreedomIntelligence/TinyDeepSeek}},
	}
	```