Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.03300

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 352
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 141
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 3
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 23 days ago • 164

Papers - Training - Math

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 53
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 107

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 107

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 107

Papers-Fundamentals

about 22 hours ago

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 12
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 61
Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13, 2024 • 5

Papers - Fine-tuning - Math

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 90
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Paper • 2408.16293 • Published Aug 29, 2024 • 26
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 107

Papers - Reasoning - Math

MAWPS paper: https://aclanthology.org/N16-1136.pdf

Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

Paper • 1705.04146 • Published May 11, 2017 • 1
Training Verifiers to Solve Math Word Problems

Paper • 2110.14168 • Published Oct 27, 2021 • 4
Explaining Math Word Problem Solvers

Paper • 2307.13128 • Published Jul 24, 2023 • 1
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms

Paper • 1905.13319 • Published May 30, 2019 • 2

Papers - Reasoning - GSM8k

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
Premise Order Matters in Reasoning with Large Language Models

Paper • 2402.08939 • Published Feb 14, 2024 • 28
Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2, 2024 • 1
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 107

Papers - Fine-tuning - PPO

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2, 2024 • 23
UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Paper • 2305.14387 • Published May 22, 2023 • 1
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 107

Papers - Fine-tuning - Preference-based RL (PbRL)

Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2
Learning Trajectory Preferences for Manipulators via Iterative Improvement

Paper • 1306.6294 • Published Jun 26, 2013 • 2
Deep reinforcement learning from human preferences

Paper • 1706.03741 • Published Jun 12, 2017 • 3
Learning Dynamic Robot-to-Human Object Handover from Human Feedback

Paper • 1603.06390 • Published Mar 21, 2016 • 2

Previous
1
2
3
4
5
6
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs