AI & ML interests

None defined yet.

Recent Activity

Zhaorun  updated a Space 5 days ago
MJ-Bench/README
EchoRaven  updated a model 6 days ago
MJ-Bench/MJ-VIDEO-2B
EchoRaven  published a model 6 days ago
MJ-Bench/MJ-VIDEO-2B
View all activity

MJ-Bench Team: Align

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Project page: https://mj-bench.github.io/ Code repository: https://github.com/MJ-Bench/MJ-Bench

While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes.

To address this issue, we introduce MJ-Bench, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias.

Specifically, we evaluate a large variety of multimodal judges including

  • 6 smaller-sized CLIP-based scoring models
  • 11 open-source VLMs (e.g. LLaVA family)
  • 4 and close-source VLMs (e.g. GPT-4o, Claude 3)

🔥🔥We are actively updating the leaderboard and you are welcome to submit the evaluation result of your multimodal judge on our dataset to huggingface leaderboard.