GARDO: Reinforcing Diffusion Models without Reward Hacking Paper • 2512.24138 • Published 10 days ago • 28
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published 10 days ago • 49
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 23 days ago • 30
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published 21 days ago • 36
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation Paper • 2512.16913 • Published 22 days ago • 33
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published 22 days ago • 37
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published 24 days ago • 16
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling Paper • 2512.15702 • Published 23 days ago • 14
DEER: Draft with Diffusion, Verify with Autoregressive Models Paper • 2512.15176 • Published 23 days ago • 42
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning Paper • 2512.14442 • Published 24 days ago • 10
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published Dec 9, 2025 • 17
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling Paper • 2512.04784 • Published Dec 2, 2025 • 24
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer Paper • 2512.05060 • Published Dec 4, 2025 • 18
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Paper • 2511.23127 • Published Nov 28, 2025 • 43
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression Paper • 2512.00891 • Published Nov 30, 2025 • 14
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 72
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 45
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24, 2025 • 28
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper • 2511.13704 • Published Nov 17, 2025 • 42