efederici
/

ipt-125m

Text Generation

text-generation-inference

Model card Files Files and versions Community

efederici commited on May 18, 2023

Commit

330ebca

·

1 Parent(s): 2da0cf7

Create README.md

Files changed (1) hide show

README.md +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,39 @@

+---
+datasets:
+- oscar-corpus/OSCAR-2301
+language:
+- it
+tags:
+- ipt-125m
+---
+# IPT-125m (WIP)
+IPT-125m is a decoder-style transformer pretrained from scratch on 4 billion tokens of Italian text from the [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) dataset.
+## How to Use
+This model is best used with the Hugging Face `transformers` library for training and finetuning.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")
+```
+## Model Description
+The architecture is a modification of a standard decoder-only transformer. The model has been modified from a standard transformer in the following ways:
+* It can use [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf)
+* It uses [ALiBi (Attention with Linear Biases)](https://arxiv.org/abs/2108.12409) and does not use positional embeddings
+* It does not use biases
+| Hyperparameter | Value |
+|----------------|-------|
+|n_parameters | 125M |
+|n_layers | 12 |
+| n_heads | 12 |
+| d_model | 768 |
+| vocab size | 50432 |
+| sequence length | 2048 |