Submitted by jefflai 52 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models · 12 authors 2
Submitted by akhaliq 47 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration · 5 authors 5
Submitted by akhaliq 36 Loong: Generating Minute-level Long Videos with Autoregressive Language Models · 8 authors 3
Submitted by msadat97 26 Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models · 3 authors 4
Submitted by kazemnejad 24 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment · 7 authors 2
Submitted by WillHeld 23 Distilling an End-to-End Voice Assistant Without Instruction Training Data · 6 authors 2
Submitted by Xiaoye08 19 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling · 4 authors 2
Submitted by onekq 12 Training Language Models on Synthetic Edit Sequences Improves Code Synthesis · 3 authors 3
Submitted by shayekh 10 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models · 6 authors 3
Submitted by ZetangForward 10 L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? · 6 authors 3
Submitted by yossig 9 Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations · 4 authors 2
Submitted by jasonyux 9 Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning · 7 authors 2
Submitted by amanchadha 9 MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation · 4 authors 3
Submitted by xiaobiaodu 8 MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis · 3 authors 3
Submitted by mucai 7 Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos · 3 authors 2
Submitted by Sreyan88 6 Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data · 6 authors 2
Submitted by lucasbandarkar 5 Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models · 7 authors 3
Submitted by weitaikang 5 Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning · 5 authors 2
Submitted by uzw 4 SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics · 5 authors 3