A modified GPT-2 model with ScaledSinusoidal position embeddings, no biases, embedding layernorm, and one shared MLP layer, with 94 million non-embedding params, that beats most similarly sized and slightly larger models (GPT-2-124m, Pythia-70/160m, Cerebras-111m) on the Open LLM Leaderboard suite of benchmarks. All while only being trained on 8 billion tokens of text from SlimPajama.

You have to pip install einops before using this model!

image/png

avg arc hellaswag mmlu truthfulqa
30.76 22.18 29.75 26.24 44.88
Downloads last month
21
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Dataset used to train crumb/cramped-94m-8btok

Collection including crumb/cramped-94m-8btok