license: apache-2.0 | |
pipeline_tag: question-answering | |
library_name: transformers | |
This repository contains the model described in the paper [SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892). | |
This model has been trained with a simple reinforcement learning (RL) recipe to improve reasoning abilities. Training starts from base models and uses rule-based rewards and the GSM8K/Math datasets. This approach has been successfully applied to diverse base models with limited data (8K examples), achieving significant accuracy gains ranging from 10 to more than 20 absolute points. | |
Code: https://github.com/hkust-nlp/simpleRL-reason/tree/v1 |