Submitted by csuhan 72 ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents · 7 authors 158 3
Submitted by JingweiZuo 53 Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance · 27 authors 51 5
Submitted by kenchan0226 35 VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning · 12 authors 3
Submitted by eliebak 13 Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding · 199 authors 2
Submitted by xiaofanghf 9 Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision · 8 authors 3 3
Submitted by tulvgengenr 8 MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE · 7 authors 94 2
Submitted by HenghuiDing 7 Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation · 4 authors 2
Submitted by akhadangi 7 Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning · 5 authors 1 2
Submitted by jahnsonblack 4 DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation · 7 authors 164 2