Update README.md
Browse files
README.md
CHANGED
@@ -5,11 +5,14 @@ library_name: transformers
|
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
-
|
|
|
9 |
---
|
10 |
-
#
|
|
|
11 |
|
12 |
-
|
|
|
13 |
|
14 |
## Merge Details
|
15 |
### Merge Method
|
@@ -83,6 +86,5 @@ slices:
|
|
83 |
layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated
|
84 |
|
85 |
merge_method: passthrough
|
86 |
-
dtype:
|
87 |
-
|
88 |
```
|
|
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
+
license: other
|
9 |
+
license_name: llama3
|
10 |
---
|
11 |
+
# Llama-3-11.5B-Instruct-attenuated
|
12 |
+
The core idea came from @jukofyork, see this [issue;](https://github.com/arcee-ai/mergekit/issues/198)
|
13 |
|
14 |
+
As I understand, The concept of the idea is to make model think twice but leap same distances like original. but why 0.7071067812?
|
15 |
+
> The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812
|
16 |
|
17 |
## Merge Details
|
18 |
### Merge Method
|
|
|
86 |
layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated
|
87 |
|
88 |
merge_method: passthrough
|
89 |
+
dtype: float16
|
|
|
90 |
```
|