kuotient
/

Llama-3-11B-Instruct-attenuated

Text Generation

text-generation-inference

Model card Files Files and versions Community

kuotient commited on Apr 20, 2024

Commit

1dc1e5a

·

verified ·

1 Parent(s): 2b4ebcb

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -5,11 +5,14 @@ library_name: transformers
 tags:
 - mergekit
 - merge
 ---
-# meta-llama-11b-2
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 ## Merge Details
 ### Merge Method
@@ -83,6 +86,5 @@ slices:
       layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated
 merge_method: passthrough
-dtype: bfloat16
 ```

 tags:
 - mergekit
 - merge
+license: other
+license_name: llama3
 ---
+# Llama-3-11.5B-Instruct-attenuated
+The core idea came from @jukofyork, see this [issue;](https://github.com/arcee-ai/mergekit/issues/198)
+As I understand, The concept of the idea is to make model think twice but leap same distances like original. but why 0.7071067812?
+> The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812
 ## Merge Details
 ### Merge Method
       layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated
 merge_method: passthrough
+dtype: float16
 ```