VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 61
Vision Models (GGUF) Collection How to use: Download a "mmproj" model file + one or more of the primary model files. • 5 items • Updated Dec 22, 2023 • 45
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19, 2024 • 31
Multimodal Foundation Models: From Specialists to General-Purpose Assistants Paper • 2309.10020 • Published Sep 18, 2023 • 41