kuotient commited on
Commit
1dc1e5a
·
verified ·
1 Parent(s): 2b4ebcb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -5,11 +5,14 @@ library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
-
 
9
  ---
10
- # meta-llama-11b-2
 
11
 
12
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
13
 
14
  ## Merge Details
15
  ### Merge Method
@@ -83,6 +86,5 @@ slices:
83
  layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated
84
 
85
  merge_method: passthrough
86
- dtype: bfloat16
87
-
88
  ```
 
5
  tags:
6
  - mergekit
7
  - merge
8
+ license: other
9
+ license_name: llama3
10
  ---
11
+ # Llama-3-11.5B-Instruct-attenuated
12
+ The core idea came from @jukofyork, see this [issue;](https://github.com/arcee-ai/mergekit/issues/198)
13
 
14
+ As I understand, The concept of the idea is to make model think twice but leap same distances like original. but why 0.7071067812?
15
+ > The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812
16
 
17
  ## Merge Details
18
  ### Merge Method
 
86
  layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated
87
 
88
  merge_method: passthrough
89
+ dtype: float16
 
90
  ```