-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 123 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2411.04952
-
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Paper • 2404.15676 • Published -
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior
Paper • 2404.10198 • Published • 7 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 70 -
FaaF: Facts as a Function for the evaluation of RAG systems
Paper • 2403.03888 • Published
-
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Paper • 2410.21169 • Published • 30 -
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 54 -
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 29 -
Contextual Document Embeddings
Paper • 2410.02525 • Published • 21