This is a custom implementation of gpt2, where we replace attention with our implementation. Currently, we don't replace softmax, but in future submits we would like to replace the softmax function in attention with other softmax variations.

We directly use the huggingface gpt2 model: https://huggingface.co/openai-community/gpt2

This model was finetuned on the wikitext dataset: https://paperswithcode.com/dataset/wikitext-2

base model: huggingface gpt2

Downloads last month: 2

Safetensors

Model size

124M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

mmoffatt
/

custom_gpt2

Dataset used to train mmoffatt/custom_gpt2