mphi commited on
Commit
ec3909a
·
1 Parent(s): ed70cc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -34
README.md CHANGED
@@ -6,44 +6,18 @@ model-index:
6
  results: []
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
-
12
  # gpt-4-est-base
13
 
14
- This model is a ...
15
-
16
- It achieves the following results on the evaluation set:
17
- - Loss: 3.9846
18
-
19
- ## Model description
20
-
21
- More information needed
22
-
23
- ## Intended uses & limitations
24
-
25
- More information needed
26
-
27
- ## Training and evaluation data
28
-
29
- More information needed
30
-
31
- ## Training procedure
32
-
33
- ### Training hyperparameters
34
-
35
- The following hyperparameters were used during training:
36
- - learning_rate: 0.0003
37
- - train_batch_size: 6
38
- - eval_batch_size: 6
39
- - seed: 42
40
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
- - lr_scheduler_type: linear
42
- - num_epochs: 3.0
43
-
44
- ### Training results
45
 
 
 
 
 
 
 
46
 
 
47
 
48
  ### Framework versions
49
 
 
6
  results: []
7
  ---
8
 
 
 
 
9
  # gpt-4-est-base
10
 
11
+ A GPT model for Estonian, trained from scratch on 2.2 billion words (Estonian National Corpus + News Crawl + Common Crawl).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
+ Model details:
14
+ - num. of layers: 12
15
+ - num. of heads: 12
16
+ - embedding size: 768
17
+ - context size: 1024
18
+ - total size: 118.68M params
19
 
20
+ Further details to be added soon.
21
 
22
  ### Framework versions
23