Josephgflowers
/

Tinyllama-r1

Model card Files Files and versions Community

Tinyllama-r1 / README.md

Josephgflowers's picture

Update README.md

01af42e verified 12 days ago

|

history blame contribute delete

3.46 kB

	---
	license: llama3.3
	datasets:
	- Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B
	base_model:
	- Josephgflowers/Tinyllama-STEM-Cinder-Agent-v1
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/TOv_fhS8IFg7tpRpKtbez.png)

	This model sponsored by the generous support of Cherry Republic.

	https://www.cherryrepublic.com/

	## Model Overview
	TinyLlama-R1 is a lightweight transformer model designed to handle instruction-following and reasoning tasks, particularly in STEM domains. This model was trained using the Magpie Reasoning V2 250K-CoT dataset, with a goal to improve reasoning through high-quality instruction-response pairs. However, based on early tests, TinyLlama-R1 shows reduced responsiveness to system-level instructions, likely due to the absence of system messages in the dataset.

	Model Name: `Josephgflowers/Tinyllama-STEM-Cinder-Agent-v1`

	---

	## Key Features
	- Dataset Focus: Built on the Magpie Reasoning V2 250K-CoT dataset, enhancing problem-solving in reasoning-heavy tasks.
	- STEM Application: Tailored for tasks involving scientific, mathematical, and logical reasoning.
	- Instruction Handling: Initial observations indicate reduced adherence to system instructions, a change from previous versions.

	---

	## Model Details
	- Model Type: Transformer-based (TinyLlama architecture)
	- Parameter Count: 1.1B
	- Context Length: Updated to 8k
	- Training Framework: Unsloth
	- Primary Use Cases:
	- Inteded for research into COT in small language models
	- Technical problem-solving
	- Instruction-following conversations

	---

	## Training Data
	The model was fine-tuned on the Magpie Reasoning V2 250K-CoT dataset. The dataset includes diverse instruction-response pairs, but notably lacks system-level messages, which has impacted the model's ability to consistently follow system directives.

	### Dataset Characteristics
	- Sources:
	Instructions were generated using models like Meta's Llama 3.1 and 3.3.
	Responses were provided by DeepSeek-R1-Distill-Llama-70B.
	- Structure: Instruction-response pairs with an emphasis on chain-of-thought (CoT) reasoning styles.
	- Limitations: No system-level instructions were included, affecting instruction prioritization and response formatting in some contexts.

	---

	## Known Issues & Limitations
	- System Instructions: The model currently does not respond well to system messages, in contrast to previous versions.
	- Performance Unverified: This version has not yet been formally tested on benchmarks like GSM-8K.

	---

	The model can be accessed and fine-tuned via [Josephgflowers on Hugging Face](https://huggingface.co/Josephgflowers).

	Training & License Information

	License: CC BY-NC 4.0 (Non-commercial use only)
	This model was trained using datasets under:

	Meta Llama 3.1 and 3.3 Community License
	CC BY-NC 4.0 (Creative Commons Non-Commercial License)

	Acknowledgments

	Thanks to the Magpie Reasoning V2 dataset creators and the researchers behind models like Deepseek-R1 and Meta Llama.

	@article{xu2024magpie,
	title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
	author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
	year={2024},
	eprint={2406.08464},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}