ClassCat commited on
Commit
1ec473c
·
1 Parent(s): d5d116f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: eu
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - cc100
6
+ - oscar
7
+ widget:
8
+ - text: "Zein da zure"
9
+ - text: "Euria egingo"
10
+ - text: "Nola dakizu ?"
11
+ ---
12
+
13
+ ## GPT2 Basque small model Version 2 (Uncased)
14
+
15
+ ### Prerequisites
16
+
17
+ transformers==4.19.2
18
+
19
+ ### Model architecture
20
+
21
+ This model uses about half the size of GPT2 base model settings.
22
+
23
+
24
+ ### Tokenizer
25
+
26
+ Using BPE tokenizer with vocabulary size 50,000.
27
+
28
+ ### Training Data
29
+
30
+ * Subset of [CC-100/eu](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
31
+ * Subset of [oscar](https://huggingface.co/datasets/oscar)
32
+
33
+ ### Usage
34
+
35
+ ```python
36
+ from transformers import pipeline
37
+
38
+ generator = pipeline('text-generation', model='ClassCat/gpt2-small-basque-v2')
39
+ generator("Zein da zure ", max_length=50, num_return_sequences=5)
40
+ ```