Giuliano
's Collections
LLM Reasoning
updated
STaR: Bootstrapping Reasoning With Reasoning
Paper
•
2203.14465
•
Published
•
8
Let's Verify Step by Step
Paper
•
2305.20050
•
Published
•
10
Training Large Language Models to Reason in a Continuous Latent Space
Paper
•
2412.06769
•
Published
•
77
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
•
2411.14405
•
Published
•
58
Alphazero-like Tree-Search can Guide Large Language Model Decoding and
Training
Paper
•
2309.17179
•
Published
•
2
Paper
•
2412.15115
•
Published
•
345
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Paper
•
2410.13639
•
Published
•
17
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
•
2411.16489
•
Published
•
42
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
•
2410.02884
•
Published
•
54
Tree of Problems: Improving structured problem solving with
compositionality
Paper
•
2410.06634
•
Published
•
8
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
91
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
•
2407.21787
•
Published
•
12
Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters
Paper
•
2408.03314
•
Published
•
54
QwQ-32B-Preview
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
•
2412.16145
•
Published
•
38
The Surprising Effectiveness of Test-Time Training for Abstract
Reasoning
Paper
•
2411.07279
•
Published
•
3
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Paper
•
2410.18451
•
Published
•
16
Skywork/Skywork-Reward-Gemma-2-27B-v0.2
Text Classification
•
Updated
•
3.83k
•
27
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper
•
2408.15240
•
Published
•
13
Understanding Hidden Computations in Chain-of-Thought Reasoning
Paper
•
2412.04537
•
Published
Paper
•
2410.12832
•
Published
•
6
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
46
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement
Learning
Paper
•
2410.02089
•
Published
•
12
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper
•
2402.06457
•
Published
•
9
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
Verification and Refinement
Paper
•
2412.12881
•
Published
•
1
Reinforcement Learning Enhanced LLMs: A Survey
Paper
•
2412.10400
•
Published
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
Reinforcement Learning Perspective
Paper
•
2412.14135
•
Published
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
•
2412.11605
•
Published
•
18
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
31
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Paper
•
2411.11694
•
Published
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
Sampling
Paper
•
2408.16737
•
Published
•
1
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
255
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
89
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
92
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
•
2501.03262
•
Published
•
90
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
36
Foundations of Large Language Models
Paper
•
2501.09223
•
Published
•
2
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
90
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Text Generation
•
Updated
•
565k
•
•
999
Reasoning Language Models: A Blueprint
Paper
•
2501.11223
•
Published
•
31
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
315
Qwen2.5-1M Technical Report
Paper
•
2501.15383
•
Published
•
55
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
•
2501.11110
•
Published
•
2
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
102
s1: Simple test-time scaling
Paper
•
2501.19393
•
Published
•
99
Process Reinforcement through Implicit Rewards
Paper
•
2502.01456
•
Published
•
53
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Paper
•
2502.01718
•
Published
•
23
RL + Transformer = A General-Purpose Problem Solver
Paper
•
2501.14176
•
Published
•
22
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
•
2502.02339
•
Published
•
18
LIMO: Less is More for Reasoning
Paper
•
2502.03387
•
Published
•
44
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
Model
Paper
•
2502.02737
•
Published
•
154
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
48
Viewer
•
Updated
•
1k
•
2.67k
•
152
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
•
2502.04404
•
Published
•
15
Agency Is Frame-Dependent
Paper
•
2502.04403
•
Published
•
19
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Paper
•
2502.04350
•
Published
•
8
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper
•
2502.02584
•
Published
•
14
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
78
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
49
Competitive Programming with Large Reasoning Models
Paper
•
2502.06807
•
Published
•
2