MJ-Bench-Team

community

https://mj-bench.github.io

MJ-Bench

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

zhuokai authored a paper 3 days ago

Preference Optimization with Multi-Sample Comparisons

zhuokai authored a paper 3 days ago

Token-Level LLM Collaboration via FusionRoute

zhuokai authored a paper 2 months ago

Scaling Agent Learning via Experience Synthesis

View all activity

Organization Card

Community About org cards

MJ-Bench Team

MJ-Bench-Team is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Project page: https://aiming-lab.github.io/MJ-VIDEO.github.io/
Code repository: https://github.com/aiming-lab/MJ-Video

We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Project page: https://mj-bench.github.io/
Code repository: https://github.com/MJ-Bench/MJ-Bench

Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a multimodal judge.

However, current multimodal judges are often under-evaluated, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce MJ-Bench, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

Alignment
Safety
Image Quality
Bias

We evaluate a wide range of multimodal judges, including:

6 smaller-sized CLIP-based scoring models
11 open-source VLMs (e.g., the LLaVA family)
4 closed-source VLMs (e.g., GPT-4, Claude 3)

🔥 We are actively updating the leaderboard!
You are welcome to submit your multimodal judge’s evaluation results on our dataset to the Hugging Face leaderboard.

Collections 3

View 3 collections

spaces 2

MJ Bench Leaderboard

🥇

Display and filter multimodal model leaderboard results

models 6

datasets 3

MJ-Bench/MJ-Bench

Viewer • Updated Oct 23, 2025 • 7.56k • 65

MJ-Bench/MJ-BENCH-VIDEO

Viewer • Updated Feb 14, 2025 • 10.8k • 108

MJ-Bench/MJ-Bench-Results

Preview • Updated Jul 9, 2024 • 37

MJ-Bench-Team

AI & ML interests

Recent Activity

MJ-Bench Team

Recent News

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Collections 3

yichaodu/DiffusionDPO-bias-hps-2.1

yichaodu/DiffusionDPO-bias-gemini-1.5

yichaodu/DiffusionDPO-bias-claude3-opus

yichaodu/DiffusionDPO-alignment-hps-2.1

MJ-Bench/DDPO-alignment-gpt-4o

MJ-Bench/DDPO-alignment-gpt-4v

MJ-Bench/DDPO-alignment-claude3-opus

yichaodu/DiffusionDPO-bias-hps-2.1

yichaodu/DiffusionDPO-bias-gemini-1.5

yichaodu/DiffusionDPO-bias-claude3-opus

yichaodu/DiffusionDPO-alignment-hps-2.1

MJ-Bench/DDPO-alignment-gpt-4o

MJ-Bench/DDPO-alignment-gpt-4v

MJ-Bench/DDPO-alignment-claude3-opus

spaces 2

MJ Bench Leaderboard

models 6

MJ-Bench/MJ-VIDEO-2B

MJ-Bench/DDPO-alignment-gpt-4v

MJ-Bench/DDPO-alignment-gpt-4o

MJ-Bench/DDPO-alignment-claude3-opus

MJ-Bench/DiffusionDPO-alignment-claude3-opus

MJ-Bench/DiffusionDPO-alignment-gemini-1.5

datasets 3

MJ-Bench/MJ-Bench

MJ-Bench/MJ-BENCH-VIDEO

MJ-Bench/MJ-Bench-Results

AI & ML interests

Recent Activity

Team members 7

MJ-Bench Team

Recent News

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Collections 3

spaces 2

MJ Bench Leaderboard

models 6 Sort: Recently updated

datasets 3 Sort: Recently updated

models 6

datasets 3