README.md · hkust-nlp/Mistral-7B-v0.1-SimpleRL-Zoo at refs/pr/1

metadata

license: apache-2.0
pipeline_tag: question-answering
library_name: transformers

This repository contains the model described in the paper SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild.

This model has been trained with a simple reinforcement learning (RL) recipe to improve reasoning abilities. Training starts from base models and uses rule-based rewards and the GSM8K/Math datasets. This approach has been successfully applied to diverse base models with limited data (8K examples), achieving significant accuracy gains ranging from 10 to more than 20 absolute points.

Code: https://github.com/hkust-nlp/simpleRL-reason/tree/v1