efederici commited on
Commit
330ebca
·
1 Parent(s): 2da0cf7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - oscar-corpus/OSCAR-2301
4
+ language:
5
+ - it
6
+ tags:
7
+ - ipt-125m
8
+ ---
9
+
10
+ # IPT-125m (WIP)
11
+
12
+ IPT-125m is a decoder-style transformer pretrained from scratch on 4 billion tokens of Italian text from the [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) dataset.
13
+
14
+ ## How to Use
15
+
16
+ This model is best used with the Hugging Face `transformers` library for training and finetuning.
17
+
18
+ ```python
19
+ from transformers import AutoModelForCausalLM, AutoTokenizer
20
+
21
+ model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
22
+ tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")
23
+ ```
24
+
25
+ ## Model Description
26
+
27
+ The architecture is a modification of a standard decoder-only transformer. The model has been modified from a standard transformer in the following ways:
28
+ * It can use [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf)
29
+ * It uses [ALiBi (Attention with Linear Biases)](https://arxiv.org/abs/2108.12409) and does not use positional embeddings
30
+ * It does not use biases
31
+
32
+ | Hyperparameter | Value |
33
+ |----------------|-------|
34
+ |n_parameters | 125M |
35
+ |n_layers | 12 |
36
+ | n_heads | 12 |
37
+ | d_model | 768 |
38
+ | vocab size | 50432 |
39
+ | sequence length | 2048 |