view article Article From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other • 3 days ago • 19
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 10 days ago • 161
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published 14 days ago • 6
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • 14 days ago • 34
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • 22 days ago • 62
view article Article How biased is Whisper ? Evaluating Whisper Models for Robustness to Diverse English Accents By Steveeeeeeen • 16 days ago • 16
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts Paper • 2501.14334 • Published 21 days ago • 17
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 43
view article Article Yay! Organizations can now publish blog Articles By huggingface and 3 others • 25 days ago • 33
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era By MiniMax-AI • about 1 month ago • 40
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 8 days ago • 233
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 255
DolphinLabeled Datasets Collection Eric Hartford has added labels to help you filter datasets, for your pleasure. • 5 items • Updated Jan 6 • 12