Knowledge Mining with Scene Text for Fine-Grained Recognition Paper • 2203.14215 • Published Mar 27, 2022
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding Paper • 2412.13193 • Published Dec 17, 2024
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 23 days ago • 36
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 23 days ago • 36
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 25
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper • 2501.01423 • Published Jan 2 • 37