|
--- |
|
license: llama3.3 |
|
datasets: |
|
- Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B |
|
base_model: |
|
- Josephgflowers/Tinyllama-STEM-Cinder-Agent-v1 |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/TOv_fhS8IFg7tpRpKtbez.png) |
|
|
|
This model sponsored by the generous support of Cherry Republic. |
|
|
|
https://www.cherryrepublic.com/ |
|
|
|
## Model Overview |
|
**TinyLlama-R1** is a lightweight transformer model designed to handle instruction-following and reasoning tasks, particularly in STEM domains. This model was trained using the **Magpie Reasoning V2 250K-CoT** dataset, with a goal to improve reasoning through high-quality instruction-response pairs. However, based on early tests, **TinyLlama-R1** shows reduced responsiveness to system-level instructions, likely due to the absence of system messages in the dataset. |
|
|
|
Model Name: `Josephgflowers/Tinyllama-STEM-Cinder-Agent-v1` |
|
|
|
--- |
|
|
|
## Key Features |
|
- **Dataset Focus**: Built on the **Magpie Reasoning V2 250K-CoT** dataset, enhancing problem-solving in reasoning-heavy tasks. |
|
- **STEM Application**: Tailored for tasks involving scientific, mathematical, and logical reasoning. |
|
- **Instruction Handling**: Initial observations indicate reduced adherence to system instructions, a change from previous versions. |
|
|
|
--- |
|
|
|
## Model Details |
|
- **Model Type**: Transformer-based (TinyLlama architecture) |
|
- **Parameter Count**: 1.1B |
|
- **Context Length**: Updated to 8k |
|
- **Training Framework**: Unsloth |
|
- **Primary Use Cases**: |
|
- Inteded for research into COT in small language models |
|
- Technical problem-solving |
|
- Instruction-following conversations |
|
|
|
--- |
|
|
|
## Training Data |
|
The model was fine-tuned on the **Magpie Reasoning V2 250K-CoT** dataset. The dataset includes diverse instruction-response pairs, but notably lacks system-level messages, which has impacted the model's ability to consistently follow system directives. |
|
|
|
### Dataset Characteristics |
|
- **Sources**: |
|
Instructions were generated using models like Meta's Llama 3.1 and 3.3. |
|
Responses were provided by DeepSeek-R1-Distill-Llama-70B. |
|
- **Structure**: Instruction-response pairs with an emphasis on chain-of-thought (CoT) reasoning styles. |
|
- **Limitations**: No system-level instructions were included, affecting instruction prioritization and response formatting in some contexts. |
|
|
|
--- |
|
|
|
## Known Issues & Limitations |
|
- **System Instructions**: The model currently does not respond well to system messages, in contrast to previous versions. |
|
- **Performance Unverified**: This version has not yet been formally tested on benchmarks like GSM-8K. |
|
|
|
--- |
|
|
|
The model can be accessed and fine-tuned via [Josephgflowers on Hugging Face](https://huggingface.co/Josephgflowers). |
|
|
|
Training & License Information |
|
|
|
License: CC BY-NC 4.0 (Non-commercial use only) |
|
This model was trained using datasets under: |
|
|
|
Meta Llama 3.1 and 3.3 Community License |
|
CC BY-NC 4.0 (Creative Commons Non-Commercial License) |
|
|
|
Acknowledgments |
|
|
|
Thanks to the Magpie Reasoning V2 dataset creators and the researchers behind models like Deepseek-R1 and Meta Llama. |
|
|
|
@article{xu2024magpie, |
|
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, |
|
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, |
|
year={2024}, |
|
eprint={2406.08464}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |