The Differences Between Direct Alignment Algorithms are a Blur Paper β’ 2502.01237 β’ Published 11 days ago β’ 112
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper β’ 2502.01584 β’ Published 11 days ago β’ 9 β’ 6
view post Post 7132 π’ New Research Alert: Making Language Models Smaller & Smarter!Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance. The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.π Key Findings:β’ 77% parameter reduction.β’ Maintained model capabilities.β’ Improved generalization.Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORTCode: https://github.com/joaopauloschuler/less-parameters-llm See translation 2 replies Β· π 18 18 π₯ 8 8 π€― 3 3 π 2 2 π§ 1 1 + Reply
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr β’ 7 days ago β’ 25
Runtime error 50 50 Open LLM Leaderboard Results PR Opener π§ Add results to model card from Open LLM Leaderboard
view article Article Distributed SFT with trl and DeepSpeed Part 2: Scaling Locally By jlzhou β’ 7 days ago β’ 1
mistralai/Mistral-Small-24B-Instruct-2501 Text Generation β’ Updated 12 days ago β’ 576k β’ β’ 739
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper β’ 2501.17703 β’ Published 16 days ago β’ 53
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published Jan 8 β’ 255
Better & Faster Large Language Models via Multi-token Prediction Paper β’ 2404.19737 β’ Published Apr 30, 2024 β’ 76
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published Dec 9, 2024 β’ 78
Enhancing Training Efficiency Using Packing with Flash Attention Paper β’ 2407.09105 β’ Published Jul 12, 2024 β’ 15