metadata
license: apache-2.0
pipeline_tag: question-answering
library_name: transformers
This repository contains the model described in the paper SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild.
This model has been trained with a simple reinforcement learning (RL) recipe to improve reasoning abilities. Training starts from base models and uses rule-based rewards and the GSM8K/Math datasets. This approach has been successfully applied to diverse base models with limited data (8K examples), achieving significant accuracy gains ranging from 10 to more than 20 absolute points.