The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper β’ 2501.07301 β’ Published Jan 13 β’ 92
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 23 days ago β’ 65
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper β’ 2502.12900 β’ Published 23 days ago β’ 77
IHEval: Evaluating Language Models on Following the Instruction Hierarchy Paper β’ 2502.08745 β’ Published 29 days ago β’ 18
ReLearn: Unlearning via Learning for Large Language Models Paper β’ 2502.11190 β’ Published 25 days ago β’ 29
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Paper β’ 2502.11196 β’ Published 25 days ago β’ 22
Logical Reasoning in Large Language Models: A Survey Paper β’ 2502.09100 β’ Published 29 days ago β’ 22
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper β’ 2502.09056 β’ Published 29 days ago β’ 30
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper β’ 2502.09604 β’ Published 28 days ago β’ 33
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation Paper β’ 2502.08690 β’ Published 29 days ago β’ 41
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper β’ 2502.08910 β’ Published 29 days ago β’ 143
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 14 items β’ Updated 2 days ago β’ 101
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators Paper β’ 2502.06394 β’ Published Feb 10 β’ 86
Expect the Unexpected: FailSafe Long Context QA for Finance Paper β’ 2502.06329 β’ Published Feb 10 β’ 126
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper β’ 2502.07346 β’ Published about 1 month ago β’ 51
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper β’ 2502.06703 β’ Published Feb 10 β’ 142
view article Article Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset By sdiazlor β’ Feb 10 β’ 48
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper β’ 2502.06781 β’ Published Feb 10 β’ 60
Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights Paper β’ 2403.03506 β’ Published Mar 6, 2024 β’ 1