Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,37 @@
|
|
1 |
---
|
2 |
-
base_model: []
|
3 |
library_name: transformers
|
4 |
tags:
|
5 |
- mergekit
|
6 |
- merge
|
7 |
|
8 |
---
|
9 |
-
# lobotollama369
|
10 |
|
|
|
|
|
|
|
11 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
## Merge Details
|
14 |
### Merge Method
|
15 |
|
@@ -18,7 +40,7 @@ This model was merged using the passthrough merge method.
|
|
18 |
### Models Merged
|
19 |
|
20 |
The following models were included in the merge:
|
21 |
-
* /
|
22 |
|
23 |
### Configuration
|
24 |
|
@@ -30,41 +52,41 @@ merge_method: passthrough
|
|
30 |
slices:
|
31 |
- sources:
|
32 |
- layer_range: [0, 29]
|
33 |
-
model: /
|
34 |
- sources:
|
35 |
- layer_range: [30, 35]
|
36 |
-
model: /
|
37 |
- sources:
|
38 |
- layer_range: [36, 40]
|
39 |
-
model: /
|
40 |
- sources:
|
41 |
- layer_range: [41, 45]
|
42 |
-
model: /
|
43 |
- sources:
|
44 |
- layer_range: [46, 49]
|
45 |
-
model: /
|
46 |
- sources:
|
47 |
- layer_range: [50, 54]
|
48 |
-
model: /
|
49 |
- sources:
|
50 |
- layer_range: [55, 59]
|
51 |
-
model: /
|
52 |
- sources:
|
53 |
- layer_range: [60, 64]
|
54 |
-
model: /
|
55 |
- sources:
|
56 |
- layer_range: [65, 69]
|
57 |
-
model: /
|
58 |
- sources:
|
59 |
- layer_range: [70, 74]
|
60 |
-
model: /
|
61 |
- sources:
|
62 |
- layer_range: [75, 79]
|
63 |
-
model: /
|
64 |
- sources:
|
65 |
- layer_range: [80, 84]
|
66 |
-
model: /
|
67 |
- sources:
|
68 |
- layer_range: [85, 126]
|
69 |
-
model: /
|
70 |
```
|
|
|
1 |
---
|
2 |
+
base_model: [meta-llama/Meta-Llama-3.1-405B]
|
3 |
library_name: transformers
|
4 |
tags:
|
5 |
- mergekit
|
6 |
- merge
|
7 |
|
8 |
---
|
|
|
9 |
|
10 |
+
|
11 |
+
|
12 |
+
# lobotollama-368b prune [Meta-Llama-3.1-405B-Base](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B).
|
13 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
14 |
|
15 |
+
# Just so you meow, this did not turn out all that great in the perplexity benchmarks. Needs healing, you'll probably need 32xh100 to do a full finetune.
|
16 |
+
# Model was designed to fin in a M2 mac-studio 192gb in 4bit.
|
17 |
+
|
18 |
+
```verilog
|
19 |
+
perplexity: 167.37 seconds per pass - ETA 33.47 minutes - meta-405b-base - q8_0 - newest base was identical in bf16 and q8_0
|
20 |
+
[1]1.3927,[2]1.6952,[3]1.5905,[4]1.4674,[5]1.3652,[6]1.3054,[7]1.2885,[8]1.2673,[9]1.2397,[10]1.2179,[11]1.2149,[12]1.2162,
|
21 |
+
Final estimate: PPL = 1.2162 +/- 0.02128
|
22 |
+
|
23 |
+
perplexity: 2197.87 seconds per pass - ETA 1 hours 49.88 minutes -- llama 405b - instruct - old BF16 -8head
|
24 |
+
[1]2.1037,[2]2.4201,[3]2.0992,[4]1.8446,[5]1.6823,[6]1.5948,[7]1.5575,[8]1.5121,[9]1.4750,[10]1.4570,[11]1.4567,[12]1.4666,
|
25 |
+
Final estimate: PPL = 1.4666 +/- 0.03184
|
26 |
+
|
27 |
+
./llama-perplexity -m /scratch-10/lobotollama-q8_0.gguf -f wiki.test.raw -t 96 --chunks 12 -b 1024
|
28 |
+
perplexity: 331.47 seconds per pass - ETA 33.13 minutes
|
29 |
+
[1]2.6744,[2]3.4041,[3]2.9683,[4]2.8669,[5]2.7924,[6]2.7590,[7]2.8274,[8]2.8306,[9]2.7943,[10]2.7910,[11]2.8164,[12]2.9396,
|
30 |
+
Final estimate: PPL = 2.9396 +/- 0.09497
|
31 |
+
```
|
32 |
+
|
33 |
+
|
34 |
+
|
35 |
## Merge Details
|
36 |
### Merge Method
|
37 |
|
|
|
40 |
### Models Merged
|
41 |
|
42 |
The following models were included in the merge:
|
43 |
+
* /Meta-Llama-3.1-405B
|
44 |
|
45 |
### Configuration
|
46 |
|
|
|
52 |
slices:
|
53 |
- sources:
|
54 |
- layer_range: [0, 29]
|
55 |
+
model: /Meta-Llama-3.1-405B
|
56 |
- sources:
|
57 |
- layer_range: [30, 35]
|
58 |
+
model: /Meta-Llama-3.1-405B
|
59 |
- sources:
|
60 |
- layer_range: [36, 40]
|
61 |
+
model: /Meta-Llama-3.1-405B
|
62 |
- sources:
|
63 |
- layer_range: [41, 45]
|
64 |
+
model: /Meta-Llama-3.1-405B
|
65 |
- sources:
|
66 |
- layer_range: [46, 49]
|
67 |
+
model: /Meta-Llama-3.1-405B
|
68 |
- sources:
|
69 |
- layer_range: [50, 54]
|
70 |
+
model: /Meta-Llama-3.1-405B
|
71 |
- sources:
|
72 |
- layer_range: [55, 59]
|
73 |
+
model: /Meta-Llama-3.1-405B
|
74 |
- sources:
|
75 |
- layer_range: [60, 64]
|
76 |
+
model: /Meta-Llama-3.1-405B
|
77 |
- sources:
|
78 |
- layer_range: [65, 69]
|
79 |
+
model: /Meta-Llama-3.1-405B
|
80 |
- sources:
|
81 |
- layer_range: [70, 74]
|
82 |
+
model: /Meta-Llama-3.1-405B
|
83 |
- sources:
|
84 |
- layer_range: [75, 79]
|
85 |
+
model: /Meta-Llama-3.1-405B
|
86 |
- sources:
|
87 |
- layer_range: [80, 84]
|
88 |
+
model: /Meta-Llama-3.1-405B
|
89 |
- sources:
|
90 |
- layer_range: [85, 126]
|
91 |
+
model: /Meta-Llama-3.1-405B
|
92 |
```
|