Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 21
Can Language Models Solve Graph Problems in Natural Language? Paper • 2305.10037 • Published May 17, 2023 • 1
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding Paper • 2305.14739 • Published May 24, 2023
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning Paper • 2308.04275 • Published Aug 8, 2023 • 1
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control Paper • 2210.17432 • Published Oct 31, 2022 • 1
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15, 2024 • 45
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1, 2024 • 22
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25, 2024 • 17
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Paper • 2303.11897 • Published Mar 21, 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception Paper • 2303.02153 • Published Mar 3, 2023
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1, 2024 • 13
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published Jun 13, 2024 • 20
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 25
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs Paper • 2404.16375 • Published Apr 25, 2024 • 17