Has My System Prompt Been Used? Large Language Model Prompt Membership Inference Paper • 2502.09974 • Published 28 days ago • 10
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 15 days ago • 18
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving Paper • 2502.07640 • Published about 1 month ago • 8
Recurrent Models Collection These are checkpoints for recurrent LLMs developed to scale test-time compute by recurring in latent space. • 14 items • Updated Feb 10 • 5
Gemstone Models Collection Our 22 open source Gemstone models for scaling laws range from 50M to 2B parameters, spanning 11 widths from 256 to 3072 and 18 depths from 3 to 80. • 59 items • Updated 15 days ago • 5
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 124
Goldfish Loss: Mitigating Memorization in LLMs Collection This collection contains artifacts from our paper titled: "Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs." • 9 items • Updated Oct 31, 2024 • 3
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs Paper • 2406.10209 • Published Jun 14, 2024 • 8
From Pixels to Prose: A Large Dataset of Dense Image Captions Paper • 2406.10328 • Published Jun 14, 2024 • 18
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 53
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 14
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22, 2024 • 44
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla Paper • 2307.09458 • Published Jul 18, 2023 • 11