Submitted by myownskyW7 62 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs · 11 authors 6
Submitted by vaishaal 50 DataComp-LM: In search of the next generation of training sets for language models · 59 authors 4
Submitted by fwnlp 38 mDPO: Conditional Preference Optimization for Multimodal Large Language Models · 7 authors 1
Submitted by ktio 33 THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation · 10 authors 1
Submitted by Yiwen-ntu 33 MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers · 12 authors 2
Submitted by philschmid 31 How Do Large Language Models Acquire Factual Knowledge During Pretraining? · 7 authors 1
Submitted by chenjoya 24 VideoLLM-online: Online Video Large Language Model for Streaming Video · 10 authors 1
Submitted by yuzhaouoe 23 A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression · 4 authors 3
Submitted by zongzhuofan 22 Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models · 5 authors 4
Submitted by davanstrien 21 MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens · 14 authors 1
Submitted by Sreyan88 20 GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities · 9 authors 1
Submitted by kaiyuyue 18 From Pixels to Prose: A Large Dataset of Dense Image Captions · 10 authors 2
Submitted by jiannanx 15 Pandora: Towards General World Model with Natural Language Actions and Video States · 13 authors 1
Submitted by syqi 15 In-Context Editing: Learning Knowledge from Self-Induced Distributions · 8 authors 5
Submitted by yuchenlin 14 WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences · 6 authors 4
Submitted by rishab-partha 8 Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion · 3 authors 1
Submitted by jifanz 7 Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning · 12 authors 2
Submitted by amanchadha 6 Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis · 3 authors 1
Submitted by billwat 4 HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies · 4 authors 1
Submitted by davidbrandfonbrener 4 CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training · 5 authors 1
Submitted by toshas 3 Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models · 3 authors 1
Submitted by luckeciano 2 Deep Bayesian Active Learning for Preference Modeling in Large Language Models · 4 authors 1