CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing Paper • 2503.10613 • Published 3 days ago • 63
CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing Paper • 2503.10613 • Published 3 days ago • 63
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published 10 days ago • 49
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published 17 days ago • 44
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness Paper • 2406.16342 • Published Jun 24, 2024
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 124
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities Paper • 2410.12219 • Published Oct 16, 2024 • 1
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 61
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 31
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 31
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization Paper • 2409.18433 • Published Sep 27, 2024
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published Oct 31, 2024 • 62
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion Paper • 2402.12741 • Published Feb 20, 2024
CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering Paper • 2401.13170 • Published Jan 24, 2024 • 4
PANDA (Pedantic ANswer-correctness Determination and Adjudication):Improving Automatic Evaluation for Question Answering and Text Generation Paper • 2402.11161 • Published Feb 17, 2024 • 1