meituan
/

DeepSeek-R1-Block-INT8

Text Generation

8-bit precision

Model card Files Files and versions Community

yuanzu commited on 29 days ago

Commit

18e4208

·

verified ·

1 Parent(s): d4440e1

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,8 +16,8 @@ In benchmarking, we observe **no accuracy loss** and up to **30\%** performance
 ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
 | Model  | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) | Output Throughput(bs=1) |
 |--------|--------|-------------------|----------------|------------------------------|--------------------------|
-| BF16 R1 | (A100\*16)x2 | 95.8              | 87.1           | 4450.02 (+33%)                | 44.18 (+18%)             |
-| INT8 R1 | A100\*32  | 95.5              | 87.1           | 3342.29                       | 37.20                     |
 ## 2. Quantization Process

 ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
 | Model  | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) | Output Throughput(bs=1) |
 |--------|--------|-------------------|----------------|------------------------------|--------------------------|
+| INT8 R1 | (A100\*16)x2 | 95.8              | 87.1           | 4450.02 (+33%)                | 44.18 (+18%)             |
+| BF16 R1 | A100\*32  | 95.5              | 87.1           | 3342.29                       | 37.20                     |
 ## 2. Quantization Process