nisten commited on
Commit
ce438e7
·
verified ·
1 Parent(s): 0db0ad9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -16
README.md CHANGED
@@ -1,15 +1,37 @@
1
  ---
2
- base_model: []
3
  library_name: transformers
4
  tags:
5
  - mergekit
6
  - merge
7
 
8
  ---
9
- # lobotollama369
10
 
 
 
 
11
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ## Merge Details
14
  ### Merge Method
15
 
@@ -18,7 +40,7 @@ This model was merged using the passthrough merge method.
18
  ### Models Merged
19
 
20
  The following models were included in the merge:
21
- * /scratch-4
22
 
23
  ### Configuration
24
 
@@ -30,41 +52,41 @@ merge_method: passthrough
30
  slices:
31
  - sources:
32
  - layer_range: [0, 29]
33
- model: /scratch-4
34
  - sources:
35
  - layer_range: [30, 35]
36
- model: /scratch-4
37
  - sources:
38
  - layer_range: [36, 40]
39
- model: /scratch-4
40
  - sources:
41
  - layer_range: [41, 45]
42
- model: /scratch-4
43
  - sources:
44
  - layer_range: [46, 49]
45
- model: /scratch-4
46
  - sources:
47
  - layer_range: [50, 54]
48
- model: /scratch-4
49
  - sources:
50
  - layer_range: [55, 59]
51
- model: /scratch-4
52
  - sources:
53
  - layer_range: [60, 64]
54
- model: /scratch-4
55
  - sources:
56
  - layer_range: [65, 69]
57
- model: /scratch-4
58
  - sources:
59
  - layer_range: [70, 74]
60
- model: /scratch-4
61
  - sources:
62
  - layer_range: [75, 79]
63
- model: /scratch-4
64
  - sources:
65
  - layer_range: [80, 84]
66
- model: /scratch-4
67
  - sources:
68
  - layer_range: [85, 126]
69
- model: /scratch-4
70
  ```
 
1
  ---
2
+ base_model: [meta-llama/Meta-Llama-3.1-405B]
3
  library_name: transformers
4
  tags:
5
  - mergekit
6
  - merge
7
 
8
  ---
 
9
 
10
+
11
+
12
+ # lobotollama-368b prune [Meta-Llama-3.1-405B-Base](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B).
13
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
14
 
15
+ # Just so you meow, this did not turn out all that great in the perplexity benchmarks. Needs healing, you'll probably need 32xh100 to do a full finetune.
16
+ # Model was designed to fin in a M2 mac-studio 192gb in 4bit.
17
+
18
+ ```verilog
19
+ perplexity: 167.37 seconds per pass - ETA 33.47 minutes - meta-405b-base - q8_0 - newest base was identical in bf16 and q8_0
20
+ [1]1.3927,[2]1.6952,[3]1.5905,[4]1.4674,[5]1.3652,[6]1.3054,[7]1.2885,[8]1.2673,[9]1.2397,[10]1.2179,[11]1.2149,[12]1.2162,
21
+ Final estimate: PPL = 1.2162 +/- 0.02128
22
+
23
+ perplexity: 2197.87 seconds per pass - ETA 1 hours 49.88 minutes -- llama 405b - instruct - old BF16 -8head
24
+ [1]2.1037,[2]2.4201,[3]2.0992,[4]1.8446,[5]1.6823,[6]1.5948,[7]1.5575,[8]1.5121,[9]1.4750,[10]1.4570,[11]1.4567,[12]1.4666,
25
+ Final estimate: PPL = 1.4666 +/- 0.03184
26
+
27
+ ./llama-perplexity -m /scratch-10/lobotollama-q8_0.gguf -f wiki.test.raw -t 96 --chunks 12 -b 1024
28
+ perplexity: 331.47 seconds per pass - ETA 33.13 minutes
29
+ [1]2.6744,[2]3.4041,[3]2.9683,[4]2.8669,[5]2.7924,[6]2.7590,[7]2.8274,[8]2.8306,[9]2.7943,[10]2.7910,[11]2.8164,[12]2.9396,
30
+ Final estimate: PPL = 2.9396 +/- 0.09497
31
+ ```
32
+
33
+
34
+
35
  ## Merge Details
36
  ### Merge Method
37
 
 
40
  ### Models Merged
41
 
42
  The following models were included in the merge:
43
+ * /Meta-Llama-3.1-405B
44
 
45
  ### Configuration
46
 
 
52
  slices:
53
  - sources:
54
  - layer_range: [0, 29]
55
+ model: /Meta-Llama-3.1-405B
56
  - sources:
57
  - layer_range: [30, 35]
58
+ model: /Meta-Llama-3.1-405B
59
  - sources:
60
  - layer_range: [36, 40]
61
+ model: /Meta-Llama-3.1-405B
62
  - sources:
63
  - layer_range: [41, 45]
64
+ model: /Meta-Llama-3.1-405B
65
  - sources:
66
  - layer_range: [46, 49]
67
+ model: /Meta-Llama-3.1-405B
68
  - sources:
69
  - layer_range: [50, 54]
70
+ model: /Meta-Llama-3.1-405B
71
  - sources:
72
  - layer_range: [55, 59]
73
+ model: /Meta-Llama-3.1-405B
74
  - sources:
75
  - layer_range: [60, 64]
76
+ model: /Meta-Llama-3.1-405B
77
  - sources:
78
  - layer_range: [65, 69]
79
+ model: /Meta-Llama-3.1-405B
80
  - sources:
81
  - layer_range: [70, 74]
82
+ model: /Meta-Llama-3.1-405B
83
  - sources:
84
  - layer_range: [75, 79]
85
+ model: /Meta-Llama-3.1-405B
86
  - sources:
87
  - layer_range: [80, 84]
88
+ model: /Meta-Llama-3.1-405B
89
  - sources:
90
  - layer_range: [85, 126]
91
+ model: /Meta-Llama-3.1-405B
92
  ```