Submitted by BestWishYsh 59 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation · 11 authors 5
Submitted by Lin1557 58 Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability · 9 authors 7
Submitted by YiwuZhong 26 AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning · 4 authors 2
Submitted by KaituoFeng 23 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? · 11 authors 2
Submitted by wanderkid 22 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation · 9 authors 2
Submitted by PereLluis13 20 Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS · 6 authors 2
Submitted by irwinherrmann 15 Motion Prompting: Controlling Video Generation with Motion Trajectories · 14 authors 2
Submitted by Jethro37 14 OmniCreator: Self-Supervised Unified Generation with Universal Editing · 4 authors 3
Submitted by Hoyard 13 LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences · 9 authors 2
Submitted by bhheo 7 MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation · 6 authors 2
Submitted by Haihao 5 A dynamic parallel method for performance optimization on hybrid CPUs · 3 authors 2
Submitted by patricebechard 4 Generating a Low-code Complete Workflow via Task Decomposition and RAG · 2 authors 2
Submitted by dpaul06 4 VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval · 4 authors 2