-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 20 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 28 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14687
-
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text • Updated • 4.61k • 36 -
TRI-ML/prismatic-vlms
Image-to-Text • Updated • 19 -
bczhou/tiny-llava-v1-hf
Image-Text-to-Text • Updated • 15.7k • 56 -
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Paper • 2402.06118 • Published • 15
-
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Paper • 2311.06772 • Published • 35 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 29 -
A Survey on Language Models for Code
Paper • 2311.07989 • Published • 22 -
Instruction-Following Evaluation for Large Language Models
Paper • 2311.07911 • Published • 20