Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
yifanzhang114
AI & ML interests
Yi-Fan Zhang presently is a forth-year PhD student at the State Key Laboratory of Pattern Recognition, University of Chinese Academy of Sciences, under the esteemed guidance of Prof. Tieniu Tan, is dedicated to spearheading robust and reliable deep learning systems and large pretrained models.
Recent Activity
updated
a dataset
about 2 hours ago
yifanzhang114/MMPreferenceV
updated
a model
1 day ago
yifanzhang114/qwen_tool
published
a model
1 day ago
yifanzhang114/qwen_tool
Organizations
Collections
4
The Next Step Forward in Multimodal LLM Alignment
-
yifanzhang114/MM-RLHF
Viewer • Updated • 16.3k • 206 • 10 -
yifanzhang114/MM-RLHF-RewardBench
Viewer • Updated • 170 • 102 • 2 -
yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen
Image-Text-to-Text • Updated • 65 • 1 -
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 35
models
7

yifanzhang114/qwen_tool
Updated
•
14

yifanzhang114/R1-Reward
Updated
•
89
•
3

yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen
Image-Text-to-Text
•
Updated
•
65
•
1

yifanzhang114/SliME-Llama3-8B
Image-Text-to-Text
•
Updated
•
32
•
3

yifanzhang114/SliME-vicuna-7B
Image-Text-to-Text
•
Updated
•
30
•
2

yifanzhang114/SliME-Llama3-8B-lora
Image-Text-to-Text
•
Updated
•
6

yifanzhang114/SliME-vicuna-13B
Image-Text-to-Text
•
Updated
•
31
•
2
datasets
12
yifanzhang114/MMPreferenceV
Preview
•
Updated
•
26
yifanzhang114/R1-Reward-RL
Viewer
•
Updated
•
17.3k
•
360
•
2
yifanzhang114/MM-RLHF
Viewer
•
Updated
•
16.3k
•
206
•
10
yifanzhang114/MM-RLHF-RewardBench
Viewer
•
Updated
•
170
•
102
•
2
yifanzhang114/MME-RealWorld-Base64
Viewer
•
Updated
•
11.5k
•
459
•
1
yifanzhang114/MME-RealWorld-Lite
Preview
•
Updated
•
60
•
3
yifanzhang114/MME-RealWorld-lite-lmms-eval
Viewer
•
Updated
•
1.92k
•
384
•
1
yifanzhang114/MME-RealWorld
Preview
•
Updated
•
871
•
16
yifanzhang114/AMBER_base64
Viewer
•
Updated
•
14.2k
•
31
yifanzhang114/MME-RealWorld-Lmms-eval
Viewer
•
Updated
•
23.1k
•
437
•
1