openbmb
/

RLAIF-V-12B

@@ -16,7 +16,7 @@ paper:
 We utilize a novel framework, [RLAIF-V](https://github.com/RLHF-V/RLAIF-V), which **aligns MLLMs in a fully open-source paradigm**. This framework maximally exploits the [open-source feedback](https://huggingface.co/datasets/HaoyeZhang/RLAIF-V-Dataset) from two key perspectives, including **high-quality feedback data** and an **online feedback learning algorithm**.
 <p align="center">
-  <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/T4hALrgNdXKHnkvb-27bA.png" alt="fig1" width="85%"/>
 </p>
 ## Model Details
@@ -27,9 +27,15 @@ We utilize a novel framework, [RLAIF-V](https://github.com/RLHF-V/RLAIF-V), whic
 * 💪 **Maintaining Well Performance on General Abilities**: On benchmarks tested with the general abilities (e.g. LLaVA Bench, MMStar), RLAIF-V-12B also exhibits good performance.
 <p align="center">
-  <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/ypXZxb4HE-jDPJU9115bi.png" alt="fig1" width="90%"/>
 </p>
-<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/ypXZxb4HE-jDPJU9115bi.png) -->
 ### Examples
 <p align="center">
@@ -59,8 +65,8 @@ If you find our model/code/paper helpful, please consider cite our papers 📝:
 }
 @article{yu2024rlaifv,
-  title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness},
-  author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
   journal={arXiv preprint arXiv:2405.17220},
   year={2024},
 }

 We utilize a novel framework, [RLAIF-V](https://github.com/RLHF-V/RLAIF-V), which **aligns MLLMs in a fully open-source paradigm**. This framework maximally exploits the [open-source feedback](https://huggingface.co/datasets/HaoyeZhang/RLAIF-V-Dataset) from two key perspectives, including **high-quality feedback data** and an **online feedback learning algorithm**.
 <p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/T4hALrgNdXKHnkvb-27bA.png" alt="fig1-1" width="85%"/>
 </p>
 ## Model Details
 * 💪 **Maintaining Well Performance on General Abilities**: On benchmarks tested with the general abilities (e.g. LLaVA Bench, MMStar), RLAIF-V-12B also exhibits good performance.
 <p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/dhsi5_okbtlBp2pfYOkFK.png" alt="fig1-2" width="90%"/>
 </p>
+* 🚀 **Inference-time Scaling by RLAIF-V Reward**: Using RLAIF-V 12B as a reward model can further improve model performance on multiple benchmarks with best-of-N selection. It also consistently improves the trustworthiness on different MLLMs.
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/QB_plzz-wRmyDcr81BXum.png" alt="fig1-3" width="50%"/>
+</p>
 ### Examples
 <p align="center">
 }
 @article{yu2024rlaifv,
+  title={RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness},
+  author={Tianyu Yu and Haoye Zhang and Qiming Li and Qixin Xu and Yuan Yao and Da Chen and Xiaoman Lu and Ganqu Cui and Yunkai Dang and Taiwen He and Xiaocheng Feng and Jun Song and Bo Zheng and Zhiyuan Liu and Tat-Seng Chua and Maosong Sun},
   journal={arXiv preprint arXiv:2405.17220},
   year={2024},
 }