Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
university
RLHFlow
RLHFlow
Activity Feed
Follow
134
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
hendrydong
updated
a collection
about 2 months ago
Minimal-RL
hendrydong
updated
a collection
about 2 months ago
Minimal-RL
hendrydong
updated
a model
about 2 months ago
RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
View all activity
Team members
8
RLHFlow
's datasets
83
Sort: Recently updated
RLHFlow/self_rewarding_turn2_example
Updated
Mar 2
•
29
RLHFlow/self_rewarding_turn1_with_rewards_example
Updated
Mar 2
•
13
RLHFlow/self_rewarding_rl_prompt
Updated
Mar 2
•
25
RLHFlow/self_rewarding_sft_prompt
Viewer
•
Updated
Mar 2
•
40k
•
22
RLHFlow/self_rewarding_ift_example_raw_data1
Viewer
•
Updated
Feb 26
•
16.3k
•
21
RLHFlow/self_rewarding_ift_example
Viewer
•
Updated
Feb 26
•
32k
•
26
RLHFlow/qwq_gen_sft_15k
Viewer
•
Updated
Feb 17
•
15k
•
24
RLHFlow/numia_prompt_ppo
Viewer
•
Updated
Feb 13
•
404k
•
21
•
1
RLHFlow/numia_prompt_dpo_test
Viewer
•
Updated
Feb 11
•
1.02k
•
24
RLHFlow/numia_prompt_dpo9
Viewer
•
Updated
Feb 11
•
20k
•
18
RLHFlow/numia_prompt_dpo8
Viewer
•
Updated
Feb 11
•
20k
•
19
RLHFlow/numia_prompt_dpo7
Viewer
•
Updated
Feb 11
•
20k
•
12
RLHFlow/numia_prompt_dpo6
Viewer
•
Updated
Feb 11
•
20k
•
21
RLHFlow/numia_prompt_dpo5
Viewer
•
Updated
Feb 11
•
20k
•
13
RLHFlow/numia_prompt_dpo4
Viewer
•
Updated
Feb 11
•
20k
•
16
RLHFlow/numia_prompt_dpo3
Viewer
•
Updated
Feb 11
•
20k
•
25
RLHFlow/numia_prompt_dpo2
Viewer
•
Updated
Feb 11
•
20k
•
24
RLHFlow/numia_prompt_dpo1
Viewer
•
Updated
Feb 11
•
20k
•
662
RLHFlow/LLM-Preferences-HelpSteer2
Viewer
•
Updated
Feb 5
•
9.13k
•
14
•
1
RLHFlow/DS-and-Mistral-PRM-Data
Viewer
•
Updated
Nov 10, 2024
•
526k
•
13
RLHFlow/Deepseek-MATH500-Test
Viewer
•
Updated
Nov 9, 2024
•
500
•
70
RLHFlow/Mistral-MATH500-Test
Viewer
•
Updated
Nov 9, 2024
•
500
•
118
RLHFlow/Deepseek-ORM-Data
Viewer
•
Updated
Nov 9, 2024
•
253k
•
24
•
3
RLHFlow/Deepseek-PRM-Data
Viewer
•
Updated
Nov 9, 2024
•
253k
•
371
•
16
RLHFlow/Mistral-ORM-Data
Viewer
•
Updated
Nov 9, 2024
•
273k
•
22
•
2
RLHFlow/Mistral-PRM-Data
Viewer
•
Updated
Nov 9, 2024
•
273k
•
314
•
12
RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-PRM
Viewer
•
Updated
Nov 8, 2024
•
500
•
17
RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-ORM
Viewer
•
Updated
Nov 8, 2024
•
500
•
17
RLHFlow/Mistral-GSM8K-Test-Result-of-Mistral-ORM
Viewer
•
Updated
Nov 8, 2024
•
1.32k
•
13
RLHFlow/DS-MATH500-Test-Result-of-Mistral-ORM
Viewer
•
Updated
Nov 8, 2024
•
500
•
12
Previous
1
2
3
Next