Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 4 days ago • 50
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published 5 days ago • 24
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published 6 days ago • 37
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models Paper • 2502.03032 • Published 7 days ago • 53
Large Language Model Guided Self-Debugging Code Generation Paper • 2502.02928 • Published 7 days ago • 8
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 154
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 14 days ago • 51
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 16 days ago • 24
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 20 days ago • 62
Control LLM: Controlled Evolution for Intelligence Retention in LLM Paper • 2501.10979 • Published 24 days ago • 6
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published 21 days ago • 23
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 21 days ago • 315
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 23 days ago • 90
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 28 days ago • 56
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published Jan 11 • 9