metadata
license: mit
language:
- am
- ar
- bn
- zh
- cs
- nl
- en
- fr
- de
- el
- ha
- he
- hi
- id
- it
- ja
- jv
- km
- ko
- lo
- ms
- mr
- fa
- pl
- pt
- ro
- ru
- es
- sw
- sv
- tl
- ta
- te
- th
- tr
- uk
- ur
- vi
datasets:
- simplescaling/s1K
- lightblue/reasoning-multilingual-R1-Llama-70B-train
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
It's a distill model like s1 and deepseek-r1-distill.
It's test model. I hope I can reproduce a rl model like RL-Zero.
This model is a mini-step.
Thanks for evveryone in the open community.