nm-research commited on
Commit
771478f
·
verified ·
1 Parent(s): 07dbc3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -42,7 +42,7 @@ from transformers import AutoTokenizer
42
  from vllm import LLM, SamplingParams
43
 
44
  max_model_len, tp_size = 4096, 1
45
- model_name = "neuralmagic-ent/granite-3.1-8b-base-FP8-dynamic"
46
  tokenizer = AutoTokenizer.from_pretrained(model_name)
47
  llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True)
48
  sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
@@ -65,6 +65,9 @@ vLLM also supports OpenAI-compatible serving. See the [documentation](https://do
65
 
66
  This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below.
67
 
 
 
 
68
  ```bash
69
  python quantize.py --model_id ibm-granite/granite-3.1-8b-base --save_path "output_dir/"
70
  ```
@@ -109,16 +112,20 @@ def main():
109
  if __name__ == "__main__":
110
  main()
111
  ```
 
112
 
113
  ## Evaluation
114
 
115
- The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard) and on [HumanEval](https://github.com/neuralmagic/evalplus), using the following commands:
116
 
 
 
 
117
  OpenLLM Leaderboard V1:
118
  ```
119
  lm_eval \
120
  --model vllm \
121
- --model_args pretrained="neuralmagic-ent/granite-3.1-8b-base-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True \
122
  --tasks openllm \
123
  --write_out \
124
  --batch_size auto \
@@ -130,7 +137,7 @@ lm_eval \
130
  ##### Generation
131
  ```
132
  python3 codegen/generate.py \
133
- --model neuralmagic-ent/granite-3.1-8b-base-FP8-dynamic \
134
  --bs 16 \
135
  --temperature 0.2 \
136
  --n_samples 50 \
@@ -140,20 +147,21 @@ python3 codegen/generate.py \
140
  ##### Sanitization
141
  ```
142
  python3 evalplus/sanitize.py \
143
- humaneval/neuralmagic-ent--granite-3.1-8b-base-FP8-dynamic_vllm_temp_0.2
144
  ```
145
  ##### Evaluation
146
  ```
147
  evalplus.evaluate \
148
  --dataset humaneval \
149
- --samples humaneval/neuralmagic-ent--granite-3.1-8b-base-FP8-dynamic_vllm_temp_0.2-sanitized
150
  ```
 
151
 
152
  ### Accuracy
153
 
154
  #### OpenLLM Leaderboard V1 evaluation scores
155
 
156
- | Metric | ibm-granite/granite-3.1-8b-base | neuralmagic-ent/granite-3.1-8b-base-FP8-dynamic |
157
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
158
  | ARC-Challenge (Acc-Norm, 25-shot) | 64.68 | 64.16 |
159
  | GSM8K (Strict-Match, 5-shot) | 60.88 | 58.45 |
@@ -165,7 +173,7 @@ evalplus.evaluate \
165
  | **Recovery** | **100.00** | **99.26** |
166
 
167
  #### HumanEval pass@1 scores
168
- | Metric | ibm-granite/granite-3.1-8b-base | neuralmagic-ent/granite-3.1-8b-base-FP8-dynamic |
169
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
170
  | HumanEval Pass@1 | 44.10 | 44.8 |
171
 
 
42
  from vllm import LLM, SamplingParams
43
 
44
  max_model_len, tp_size = 4096, 1
45
+ model_name = "neuralmagic/granite-3.1-8b-base-FP8-dynamic"
46
  tokenizer = AutoTokenizer.from_pretrained(model_name)
47
  llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True)
48
  sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
 
65
 
66
  This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below.
67
 
68
+ <details>
69
+ <summary>Model Creation Code</summary>
70
+
71
  ```bash
72
  python quantize.py --model_id ibm-granite/granite-3.1-8b-base --save_path "output_dir/"
73
  ```
 
112
  if __name__ == "__main__":
113
  main()
114
  ```
115
+ </details>
116
 
117
  ## Evaluation
118
 
119
+ The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard), OpenLLM Leaderboard [V2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) and on [HumanEval](https://github.com/neuralmagic/evalplus), using the following commands:
120
 
121
+ <details>
122
+ <summary>Evaluation Commands</summary>
123
+
124
  OpenLLM Leaderboard V1:
125
  ```
126
  lm_eval \
127
  --model vllm \
128
+ --model_args pretrained="neuralmagic/granite-3.1-8b-base-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True \
129
  --tasks openllm \
130
  --write_out \
131
  --batch_size auto \
 
137
  ##### Generation
138
  ```
139
  python3 codegen/generate.py \
140
+ --model neuralmagic/granite-3.1-8b-base-FP8-dynamic \
141
  --bs 16 \
142
  --temperature 0.2 \
143
  --n_samples 50 \
 
147
  ##### Sanitization
148
  ```
149
  python3 evalplus/sanitize.py \
150
+ humaneval/neuralmagic--granite-3.1-8b-base-FP8-dynamic_vllm_temp_0.2
151
  ```
152
  ##### Evaluation
153
  ```
154
  evalplus.evaluate \
155
  --dataset humaneval \
156
+ --samples humaneval/neuralmagic--granite-3.1-8b-base-FP8-dynamic_vllm_temp_0.2-sanitized
157
  ```
158
+ </details>
159
 
160
  ### Accuracy
161
 
162
  #### OpenLLM Leaderboard V1 evaluation scores
163
 
164
+ | Metric | ibm-granite/granite-3.1-8b-base | neuralmagic/granite-3.1-8b-base-FP8-dynamic |
165
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
166
  | ARC-Challenge (Acc-Norm, 25-shot) | 64.68 | 64.16 |
167
  | GSM8K (Strict-Match, 5-shot) | 60.88 | 58.45 |
 
173
  | **Recovery** | **100.00** | **99.26** |
174
 
175
  #### HumanEval pass@1 scores
176
+ | Metric | ibm-granite/granite-3.1-8b-base | neuralmagic/granite-3.1-8b-base-FP8-dynamic |
177
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
178
  | HumanEval Pass@1 | 44.10 | 44.8 |
179