--- license: mit ---

[[📖 arXiv Paper](https://arxiv.org/abs/2406.08487)] [[📊 MM-RLHF Data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF)] [[📝 Homepage](https://mm-rlhf.github.io/)] [[🏆 Reward Model](https://huggingface.co/yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen)] [[🔮 MM-RewardBench](https://huggingface.co/datasets/yifanzhang114/MM-RLHF-RewardBench)] [[🔮 MM-SafetyBench](https://github.com/yfzhang114/mmrlhf-eval)] [[📈 Evaluation Suite](https://github.com/yfzhang114/mmrlhf-eval)]

# The Next Step Forward in Multimodal LLM Alignment **[2025/02/10]** 🔥 We are proud to open-source **MM-RLHF**, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes: - A **high-quality MLLM alignment dataset**. - A **strong Critique-Based MLLM reward model** and its training algorithm. - A **novel alignment algorithm MM-DPO**. - **Two new benchmarks**. Our dataset and algorithms enable consistent performance improvements across **10 dimensions** and **27 benchmarks**.

## Use ### Intended use The model was trained on [MM-RLHF data](https://huggingface.co/datasets/yifanzhang114/MM-RLHF) and have the ability to interact with images, multi-image and videos. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623d8ca4c29adf5ef6175615/2RQJMhntIwE15y9lEtBfP.png) **Feel free to share your generations in the Community tab!** ### Generation We provide the simple generation process for using our model. For more details, you could refer to [Github](https://github.com/yfzhang114/MM-RLHF). ## Citation If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX: ```bibtex ```