Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper β’ 2512.01816 β’ Published Dec 1, 2025 β’ 88
Taming LLMs by Scaling Learning Rates with Gradient Grouping Paper β’ 2506.01049 β’ Published Jun 1, 2025 β’ 38
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper β’ 2504.00999 β’ Published Apr 1, 2025 β’ 95
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes Paper β’ 2503.13435 β’ Published Mar 17, 2025 β’ 18
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models Paper β’ 2406.06007 β’ Published Jun 10, 2024 β’ 2
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning Paper β’ 2410.06373 β’ Published Oct 8, 2024 β’ 36
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis Paper β’ 2410.07155 β’ Published Oct 9, 2024 β’ 11
Switch EMA: A Free Lunch for Better Flatness and Sharpness Paper β’ 2402.09240 β’ Published Feb 14, 2024 β’ 5
OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning Paper β’ 2209.04851 β’ Published Sep 11, 2022 β’ 3