Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.12376

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 44
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 33
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 42
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Paper • 2311.04145 • Published Nov 7, 2023 • 35

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 16
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Paper • 2309.15103 • Published Sep 26, 2023 • 42
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 95
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Paper • 2402.10210 • Published Feb 15, 2024 • 35
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48

Diffusion Model

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Paper • 2309.03895 • Published Sep 7, 2023 • 14
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Paper • 2309.16650 • Published Sep 28, 2023 • 10
CCEdit: Creative and Controllable Video Editing via Diffusion Models

Paper • 2309.16496 • Published Sep 28, 2023 • 9
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling

Paper • 2310.15169 • Published Oct 23, 2023 • 10

Vision Transformers

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 15
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 79
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling

Paper • 2309.16534 • Published Sep 28, 2023 • 15

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs