Large Reasoning Models - a yanhongxi Collection

yanhongxi 's Collections

Open World Adaption

Large Reasoning Models

LLM Image Safty

Large Reasoning Models

updated 5 days ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 6 days ago • 49
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published 6 days ago • 44
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information

Paper • 2502.02095 • Published 8 days ago • 4
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Paper • 2502.02584 • Published 7 days ago • 14
o3-mini vs DeepSeek-R1: Which One is Safer?

Paper • 2501.18438 • Published 12 days ago • 22
Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published 11 days ago • 34
Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published 19 days ago • 48
CodeMonkeys: Scaling Test-Time Compute for Software Engineering

Paper • 2501.14723 • Published 18 days ago • 7
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Paper • 2501.17703 • Published 14 days ago • 51
Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

Paper • 2501.17749 • Published 13 days ago • 12
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published 12 days ago • 51
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published 25 days ago • 14