-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 352 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 141 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 164
Collections
Discover the best community collections!
Collections including paper arxiv:2402.03300
-
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 12 -
Attention Is All You Need
Paper • 1706.03762 • Published • 55 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 61 -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 5
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 90 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 26 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 107
-
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Paper • 1705.04146 • Published • 1 -
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
Explaining Math Word Problem Solvers
Paper • 2307.13128 • Published • 1 -
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Paper • 1905.13319 • Published • 2
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 28 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 107
-
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 23 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Paper • 2305.14387 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 107
-
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2 -
Learning Trajectory Preferences for Manipulators via Iterative Improvement
Paper • 1306.6294 • Published • 2 -
Deep reinforcement learning from human preferences
Paper • 1706.03741 • Published • 3 -
Learning Dynamic Robot-to-Human Object Handover from Human Feedback
Paper • 1603.06390 • Published • 2