LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information Paper • 2502.02095 • Published 8 days ago • 4
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search Paper • 2502.02584 • Published 7 days ago • 14
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 11 days ago • 34
CodeMonkeys: Scaling Test-Time Compute for Software Engineering Paper • 2501.14723 • Published 18 days ago • 7
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 14 days ago • 51
Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation Paper • 2501.17749 • Published 13 days ago • 12
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 12 days ago • 51
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published 25 days ago • 14