-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 16 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 26 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2311.00430
-
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 58 -
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Paper • 2309.13876 • Published • 1 -
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Paper • 2310.06434 • Published • 4
-
Masked Autoencoders Are Scalable Vision Learners
Paper • 2111.06377 • Published • 3 -
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 58 -
distil-whisper/distil-large-v2
Automatic Speech Recognition • Updated • 169k • 505 -
Seven Failure Points When Engineering a Retrieval Augmented Generation System
Paper • 2401.05856 • Published • 2
-
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper • 2312.03818 • Published • 32 -
Scaling Laws of Synthetic Images for Model Training ... for Now
Paper • 2312.04567 • Published • 8 -
Large Language Models for Mathematicians
Paper • 2312.04556 • Published • 12 -
LooseControl: Lifting ControlNet for Generalized Depth Conditioning
Paper • 2312.03079 • Published • 13