Reasoning - a galois77 Collection

galois77 's Collections

Training optimization

RL

Benchmarks and challenges

Reasoning

updated 2 days ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published 16 days ago • 53
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization

Paper • 2412.18279 • Published Dec 24, 2024
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published 28 days ago • 15
Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published 15 days ago • 35
Dynamic Scaling of Unit Tests for Code Reward Modeling

Paper • 2501.01054 • Published Jan 2 • 17
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 33
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

Paper • 2410.09008 • Published Oct 11, 2024 • 17
Subtle Errors Matter: Preference Learning via Error-injected Self-editing

Paper • 2410.06638 • Published Oct 9, 2024
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published 9 days ago • 18
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published 5 days ago • 47