Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 8 days ago • 78
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering Paper • 2502.03628 • Published Feb 5 • 12
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2 • 21