ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation Paper • 2503.06307 • Published 7 days ago • 1
On Robustness and Transferability of Convolutional Neural Networks Paper • 2007.08558 • Published Jul 16, 2020 • 1 • 1
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published 2 days ago • 11 • 3
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Paper • 2503.10437 • Published 2 days ago • 20 • 2
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1 Paper • 2503.10635 • Published 2 days ago • 1 • 2
Autoregressive Image Generation with Randomized Parallel Decoding Paper • 2503.10568 • Published 2 days ago • 6 • 2
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation Paper • 2503.10636 • Published 2 days ago • 3 • 2
Charting and Navigating Hugging Face's Model Atlas Paper • 2503.10633 • Published 2 days ago • 43 • 4
Piece it Together: Part-Based Concepting with IP-Priors Paper • 2503.10365 • Published 2 days ago • 4 • 2
Distilling Diversity and Control in Diffusion Models Paper • 2503.10637 • Published 2 days ago • 11 • 2
Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective Paper • 2503.10638 • Published 2 days ago • 2 • 2
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models Paper • 2503.09669 • Published 3 days ago • 29 • 2
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Paper • 2503.09642 • Published 4 days ago • 12 • 2
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo Paper • 2503.09799 • Published 3 days ago • 10 • 2
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published 2 days ago • 12 • 2
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published 2 days ago • 33 • 2
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Paper • 2503.10596 • Published 2 days ago • 18 • 2
PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling Paper • 2503.09368 • Published 3 days ago • 1 • 2
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Paper • 2503.10391 • Published 2 days ago • 9 • 2