Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published 5 days ago • 13
Recurrent Models Collection These are checkpoints for recurrent LLMs developed to scale test-time compute by recurring in latent space. • 14 items • Updated 5 days ago • 5
Gemstones: A Model Suite for Multi-Faceted Scaling Laws Paper • 2502.06857 • Published 7 days ago • 21
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 7 days ago • 92
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published 8 days ago • 25
Gemstone Models Collection Our 22 open source Gemstone models for scaling laws range from 50M to 2B parameters, spanning 11 widths from 256 to 3072 and 18 depths from 3 to 80. • 59 items • Updated 3 days ago • 4
From Pixels to Prose: A Large Dataset of Dense Image Captions Paper • 2406.10328 • Published Jun 14, 2024 • 18
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 52