-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 20 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 5 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2408.08872
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 191 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 46 -
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Paper • 2403.04692 • Published • 40
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 41
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Paper • 2311.10709 • Published • 26 -
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Paper • 2405.12970 • Published • 25 -
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper • 2405.11473 • Published • 55 -
stabilityai/stable-diffusion-3-medium
Text-to-Image • Updated • 22.6k • • 4.72k