FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 45
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 51
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published Jun 26, 2024 • 42
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3, 2024 • 47