ThatsGroes commited on
Commit
4143c3b
·
verified ·
1 Parent(s): 4a50647

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ base_model:
4
+ - google/gemma-2-27b-it
5
+ ---
6
+
7
+ FP-8 quantized version of google/gemma-2-27b-it quantized with **compute sponsored by Arrow and Nvidia through Danish Data Science Community**.
8
+
9
+ Quantized using this script:
10
+
11
+ ```python
12
+ from llmcompressor.transformers import SparseAutoModelForCausalLM
13
+ from transformers import AutoTokenizer
14
+ from llmcompressor.transformers import oneshot
15
+ from llmcompressor.modifiers.quantization import QuantizationModifier
16
+
17
+
18
+ MODEL_ID = "google/gemma-2-27b-it"
19
+
20
+ model = SparseAutoModelForCausalLM.from_pretrained(
21
+ MODEL_ID, device_map="auto", torch_dtype="auto")
22
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
23
+
24
+
25
+ # Configure the simple PTQ quantization
26
+ recipe = QuantizationModifier(
27
+ targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])
28
+
29
+ # Apply the quantization algorithm.
30
+ oneshot(model=model, recipe=recipe)
31
+
32
+ # Save the model.
33
+ SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
34
+ model.save_pretrained(SAVE_DIR)
35
+ tokenizer.save_pretrained(SAVE_DIR)
36
+ ```