CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper • 2503.09662 • Published 3 days ago • 27
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published 2 days ago • 13
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 19
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 5 days ago • 34
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published 5 days ago • 17
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 5 days ago • 73
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Paper • 2503.01840 • Published 12 days ago • 4
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published 9 days ago • 14
Learning from Failures in Multi-Attempt Reinforcement Learning Paper • 2503.04808 • Published 12 days ago • 17
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Paper • 2503.05179 • Published 9 days ago • 42
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published 8 days ago • 25
Digi-Q Collection What will happen if we train a Q function for digital agents? • 4 items • Updated 25 days ago • 3
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 18 days ago • 68
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Paper • 2502.06060 • Published Feb 9 • 34
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 54
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published Jan 26 • 23