Intel
/

chatglm2-6b-int4-inc

Model card Files Files and versions Community

yintongl commited on May 9, 2024

Commit

10afe5a

·

verified ·

1 Parent(s): 913fa7a

Update README.md

Files changed (1) hide show

README.md +7 -22

README.md CHANGED Viewed

@@ -18,12 +18,13 @@ This model is an int4 model with group_size 128 of [THUDM/chatglm2-6b](https://h
 ```python
 ##pip install auto-gptq[triton]
 ##pip install triton==2.2.0
-from transformers import AutoModelForCausalLM, AutoTokenizer
 quantized_model_dir = "Intel/chatglm2-6b-int4-inc"
-model = AutoModelForCausalLM.from_pretrained(quantized_model_dir,
-                                             device_map="auto",
-                                             trust_remote_code=False,
-                                             )
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
 print(tokenizer.decode(model.generate(**tokenizer("There is a girl who likes adventure,", return_tensors="pt").to(model.device),max_new_tokens=50)[0]))
 ```
@@ -35,27 +36,11 @@ print(tokenizer.decode(model.generate(**tokenizer("There is a girl who likes adv
 Install [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness.git) from source, we used the git id 96d185fa6232a5ab685ba7c43e45d1dbb3bb906d
 ```bash
-lm_eval --model hf --model_args pretrained="Intel/Mistral-7B-Instruct-v0.2-int4-inc",autogptq=True,gptq_use_triton=True --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 32
 ```
-| Metric         | BF16   | INT4   |
-| -------------- | ------ | ------ |
-| Avg.           | 0.6647 | 0.6621 |
-| mmlu           | 0.5906 | 0.5872 |
-| lambada_openai | 0.7141 | 0.7141 |
-| hellaswag      | 0.6602 | 0.6557 |
-| winogrande     | 0.7395 | 0.7364 |
-| piqa           | 0.8052 | 0.8047 |
-| truthfulqa_mc1 | 0.5251 | 0.5153 |
-| openbookqa     | 0.3600 | 0.3420 |
-| boolq          | 0.8535 | 0.8541 |
-| rte            | 0.7040 | 0.7148 |
-| arc_easy       | 0.8161 | 0.8165 |
-| arc_challenge  | 0.5435 | 0.5435 |
 ### Reproduce the model

 ```python
 ##pip install auto-gptq[triton]
 ##pip install triton==2.2.0
+from transformers import AutoModel, AutoTokenizer
 quantized_model_dir = "Intel/chatglm2-6b-int4-inc"
+model = AutoModel.from_pretrained(quantized_model_dir,
+                                  device_map="auto",
+                                  trust_remote_code=False,
+                                )
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
 print(tokenizer.decode(model.generate(**tokenizer("There is a girl who likes adventure,", return_tensors="pt").to(model.device),max_new_tokens=50)[0]))
 ```
 Install [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness.git) from source, we used the git id 96d185fa6232a5ab685ba7c43e45d1dbb3bb906d
 ```bash
+lm_eval --model hf --model_args pretrained="Intel/chatglm2-6b-int4-inc",autogptq=True,gptq_use_triton=True --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 32
 ```
 ### Reproduce the model