-
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 45 -
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Paper • 2404.14351 • Published • 6 -
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper • 2404.17672 • Published • 19 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 67
Collections
Discover the best community collections!
Collections including paper arxiv:2406.06525
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 3 -
Music Consistency Models
Paper • 2404.13358 • Published • 13 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 22
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 3
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 31 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 28
-
CiaraRowles/TemporalDiff
Text-to-Video • Updated • 171 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 21 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 67
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 68 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 47 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 28 -
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 35
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 7 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2