RLHFlow

university

RLHFlow

RLHFlow

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

baohao updated a collection about 1 month ago

baohao updated a collection about 1 month ago

baohao updated a model about 1 month ago

RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy

View all activity

Papers

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

View all Papers

RLHFlow 's datasets 88

RLHFlow/reinforce_ada_hard_prompt_1-5b

Viewer • Updated Oct 16 • 13.3k • 54

RLHFlow/reinforce_ada_simple_prompt_1-5b

Viewer • Updated Oct 16 • 25k • 115

RLHFlow/reinforce_ada_hard_prompt_llama

Viewer • Updated Oct 10 • 15k • 49

RLHFlow/reinforce_ada_easy_prompt

Viewer • Updated Oct 10 • 24.3k • 72

RLHFlow/reinforce_ada_hard_prompt

Viewer • Updated Oct 10 • 15.7k • 97 • 2

RLHFlow/self_rewarding_turn2_example

Updated Mar 2 • 16

RLHFlow/self_rewarding_turn1_with_rewards_example

Updated Mar 2 • 26

RLHFlow/self_rewarding_rl_prompt

Updated Mar 2 • 10

RLHFlow/self_rewarding_sft_prompt

Viewer • Updated Mar 2 • 40k • 12

RLHFlow/self_rewarding_ift_example_raw_data1

Viewer • Updated Feb 26 • 16.3k • 11

RLHFlow/self_rewarding_ift_example

Viewer • Updated Feb 26 • 32k • 63

RLHFlow/qwq_gen_sft_15k

Viewer • Updated Feb 17 • 15k • 10

RLHFlow/numia_prompt_ppo

Viewer • Updated Feb 13 • 404k • 22 • 1

RLHFlow/numia_prompt_dpo_test

Viewer • Updated Feb 11 • 1.02k • 19

RLHFlow/numia_prompt_dpo9

Viewer • Updated Feb 11 • 20k • 12

RLHFlow/numia_prompt_dpo8

Viewer • Updated Feb 11 • 20k • 20

RLHFlow/numia_prompt_dpo7

Viewer • Updated Feb 11 • 20k • 8

RLHFlow/numia_prompt_dpo6

Viewer • Updated Feb 11 • 20k • 13

RLHFlow/numia_prompt_dpo5

Viewer • Updated Feb 11 • 20k • 10

RLHFlow/numia_prompt_dpo4

Viewer • Updated Feb 11 • 20k • 25

RLHFlow/numia_prompt_dpo3

Viewer • Updated Feb 11 • 20k • 42

RLHFlow/numia_prompt_dpo2

Viewer • Updated Feb 11 • 20k • 17

RLHFlow/numia_prompt_dpo1

Viewer • Updated Feb 11 • 20k • 89

RLHFlow/LLM-Preferences-HelpSteer2

Viewer • Updated Feb 5 • 9.13k • 24 • 1

RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10, 2024 • 526k • 52

RLHFlow/Deepseek-MATH500-Test

Viewer • Updated Nov 9, 2024 • 500 • 146

RLHFlow/Mistral-MATH500-Test

Viewer • Updated Nov 9, 2024 • 500 • 66

RLHFlow/Deepseek-ORM-Data

Viewer • Updated Nov 9, 2024 • 253k • 40 • 3

RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 161 • 17

RLHFlow/Mistral-ORM-Data

Viewer • Updated Nov 9, 2024 • 273k • 33 • 2