File size: 731 Bytes
6da186d |
1 2 3 4 5 6 7 8 9 10 11 |
---
license: apache-2.0
pipeline_tag: question-answering
library_name: transformers
---
This repository contains the model described in the paper [SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892).
This model has been trained with a simple reinforcement learning (RL) recipe to improve reasoning abilities. Training starts from base models and uses rule-based rewards and the GSM8K/Math datasets. This approach has been successfully applied to diverse base models with limited data (8K examples), achieving significant accuracy gains ranging from 10 to more than 20 absolute points.
Code: https://github.com/hkust-nlp/simpleRL-reason/tree/v1 |