-
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Paper • 2412.09593 • Published • 18 -
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Paper • 2412.16112 • Published • 23 -
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Paper • 2412.14171 • Published • 24 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2501.09503
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 55 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 70 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 25 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 42
-
Customizing Text-to-Image Models with a Single Image Pair
Paper • 2405.01536 • Published • 22 -
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Paper • 2404.03913 • Published -
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Paper • 2404.03620 • Published • 1 -
Customizing Text-to-Image Diffusion with Camera Viewpoint Control
Paper • 2404.12333 • Published • 1
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 64 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1
-
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 13 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15 -
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper • 2312.04461 • Published • 62 -
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Paper • 2401.02955 • Published • 22