hkust-nlp
/

Mistral-7B-v0.1-SimpleRL-Zoo

Model card Files Files and versions Community

Mistral-7B-v0.1-SimpleRL-Zoo / README.md

nielsr's picture

nielsr HF Staff

Add metadata, link to code

6da186d verified 2 months ago

|

731 Bytes

	---
	license: apache-2.0
	pipeline_tag: question-answering
	library_name: transformers
	---

	This repository contains the model described in the paper [SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892).

	This model has been trained with a simple reinforcement learning (RL) recipe to improve reasoning abilities. Training starts from base models and uses rule-based rewards and the GSM8K/Math datasets. This approach has been successfully applied to diverse base models with limited data (8K examples), achieving significant accuracy gains ranging from 10 to more than 20 absolute points.

	Code: https://github.com/hkust-nlp/simpleRL-reason/tree/v1