-
Gemini: A Family of Highly Capable Multimodal Models
Paper ā¢ 2312.11805 ā¢ Published ā¢ 45 -
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper ā¢ 2312.14233 ā¢ Published ā¢ 16 -
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Paper ā¢ 2405.18669 ā¢ Published ā¢ 12
Collections
Discover the best community collections!
Collections including paper arxiv:2312.11805
-
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Paper ā¢ 2312.02087 ā¢ Published ā¢ 23 -
FaceStudio: Put Your Face Everywhere in Seconds
Paper ā¢ 2312.02663 ā¢ Published ā¢ 33 -
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper ā¢ 2312.02432 ā¢ Published ā¢ 14 -
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper ā¢ 2312.02981 ā¢ Published ā¢ 10
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 52 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 157 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 46