What is this?

This is a model based on the EleutherAI/gpt-neo-1.3B model containing 1.3 B parameters for Danish text generation. The model was not pre-trained from scratch but adapted from the English version using CLP-Transfer.

How to use

Test the model using the pipeline from the ๐Ÿค— Transformers library:

from transformers import pipeline

generator = pipeline("text-generation", model = "KennethTM/gpt-neo-1.3B-danish")
text = generator("Der var engang ")

print(text[0]["generated_text"])

Or load it using the Auto* classes:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("KennethTM/gpt-neo-1.3B-danish")
model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt-neo-1.3B-danish")

Model training

The training data are the Danish part of the oscar dataset ('unshuffled_deduplicated_da') which is split randomly into training (95%) and validation (5%) datasets.

The model weights are initialized from the English gpt-neo-1.3B model ('source model') with new word token embeddings created from the Danish GPT-2 small model ('helper model') using the CLP-Transfer method.

Training is done using a context window of 1024 and mixed precision (bf16). First, only the word token embeddings are trained using 0.5 M samples followed by training of all weights using approximately 2 M samples (1 epoch).

The model achieves a perplexity of 16.75 on approximately 0.1 M validation samples.

The model is trained on a 24 GB GPU.

Notes

This is a pre-trained model, for optimal performance it should be finetuned for new downstream tasks tasks.

Downloads last month
40
Safetensors
Model size
1.32B params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train KennethTM/gpt-neo-1.3B-danish

Space using KennethTM/gpt-neo-1.3B-danish 1