-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 123 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2408.00714
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 112 -
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Paper • 2408.07416 • Published • 7 -
SMITE: Segment Me In TimE
Paper • 2410.18538 • Published • 16 -
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Paper • 2410.23287 • Published • 19
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 112 -
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 21 -
DELTA: Dense Efficient Long-range 3D Tracking for any video
Paper • 2410.24211 • Published • 9
-
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Paper • 2407.10960 • Published • 12 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 26 -
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper • 2407.14177 • Published • 43 -
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Paper • 2407.15017 • Published • 34