This is a custom implementation of gpt2, where we replace attention with our implementation. Currently, we don't replace softmax, but in future submits we would like to replace the softmax function in attention with other softmax variations.

We directly use the huggingface gpt2 model: https://huggingface.co/openai-community/gpt2

This model was finetuned on the wikitext dataset: https://paperswithcode.com/dataset/wikitext-2

base model: huggingface gpt2

Downloads last month
6
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train mmoffatt/custom_gpt2