crumb
/

cramped-94m-8btok

Text Generation

Model card Files Files and versions Community

A modified GPT-2 model with ScaledSinusoidal position embeddings, no biases, embedding layernorm, and one shared MLP layer, with 94 million non-embedding params, that beats most similarly sized and slightly larger models (GPT-2-124m, Pythia-70/160m, Cerebras-111m) on the Open LLM Leaderboard suite of benchmarks. All while only being trained on 8 billion tokens of text from SlimPajama.

You have to pip install einops before using this model!

avg	arc	hellaswag	mmlu	truthfulqa
30.76	22.18	29.75	26.24	44.88

Downloads last month: 24

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Dataset used to train crumb/cramped-94m-8btok

Collection including crumb/cramped-94m-8btok

Cramp(ed) Models

Smaller models trained locally on my 2xA6000 Lambda Vector • 3 items • Updated Oct 10, 2023 • 1