File size: 2,545 Bytes

e4f86d5
 
ba2aed7
 
e4f86d5
 
72b84a5
 
 
e4f86d5
 
831e8f1
e4f86d5
 
 
58866a2
e4f86d5
 
 
65499e9
 
e4f86d5
 
 
 
 
 
 
 
 
 
 
 
ba2aed7
e4f86d5
 
 
5071653
 
 
 
 
 
 
e41e1f2
 
 
5071653
 
 
 
a2c647a
e4f86d5
 
 
 
9a6e650
 
 
 
 
 
ba2aed7

---
license: mit
library_name: transformers
pipeline_tag: image-text-to-text
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/ZxbX8GLRok8cUlwj4q_xo.png)



<font size=3><div align='center' >  
[[📖 arXiv Paper](https://arxiv.org/abs/2502.10391)] 
[[📊 MM-RLHF Data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF)] 
[[📝 Homepage](https://mm-rlhf.github.io/)] 
[[🏆 Reward Model](https://huggingface.co/yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen)] 

[[🔮 MM-RewardBench](https://huggingface.co/datasets/yifanzhang114/MM-RLHF-RewardBench)] 
[[🔮 MM-SafetyBench](https://github.com/yfzhang114/mmrlhf-eval)] 
[[📈 Evaluation Suite](https://github.com/yfzhang114/mmrlhf-eval)] 
[[📊 Training Code](https://github.com/yfzhang114/MM-RLHF)] 

</div></font>


# The Next Step Forward in Multimodal LLM Alignment

**[2025/02/10]** 🔥 We are proud to open-source **MM-RLHF**, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:

- A **high-quality MLLM alignment dataset**.
- A **strong Critique-Based MLLM reward model** and its training algorithm.
- A **novel alignment algorithm MM-DPO**.
- **Two new benchmarks**.

Our dataset and algorithms enable consistent performance improvements across **10 dimensions** and **27 benchmarks**.\n<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/8nVZQd8bfB6NJIixCv6_X.png" width="80%" />
</p>


## Use

### Intended use

The model was trained on [MM-RLHF data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF) and have the ability to interact with images, multi-image and videos. 


![image/png](https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/2RQJMhntIwE15y9lEtBfP.png)

**Feel free to share your generations in the Community tab!**

### Generation

We provide the simple generation process for using our model. For more details, you could refer to [Github](https://github.com/yfzhang114/MM-RLHF).
## Citation

If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:
```bibtex
@article{zhang2025mm,
  title={MM-RLHF: The Next Step Forward in Multimodal LLM Alignment},
  author={Zhang, Yi-Fan and Yu, Tao and Tian, Haochen and Fu, Chaoyou and Li, Peiyan and Zeng, Jianshu and Xie, Wulin and Shi, Yang and Zhang, Huanyu and Wu, Junkang and others},
  journal={arXiv preprint arXiv:2502.10391},
  year={2025}
}
```