Submitted by akhaliq 78 LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding · 13 authors 12
Submitted by akhaliq 57 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites · 27 authors 5
Submitted by akhaliq 20 Interactive3D: Create What You Want by Interactive 3D Generation · 6 authors 1
Submitted by akhaliq 18 ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving · 11 authors 1
Submitted by akhaliq 18 List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs · 11 authors 2
Submitted by akhaliq 17 Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings · 11 authors 2
Submitted by akhaliq 9 SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension · 6 authors 1