Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -119,7 +119,7 @@ The acceleration effects of LLMs with different sparsity are displayed as follow
|
|
119 |
| **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | 8.67 | 55.29 | 2.38 | 67.50 | 1.68 |
|
120 |
| **ProSparse-13B** | 88.80 | 91.11 | 78.28 | - | 53.78 | 2.44 | 66.73 | 1.70 |
|
121 |
|
122 |
-
**Notes**: Fixed \\(L_1\\) suffers from severe performance degradation. ProSparse with Activation Threshold Shifting is not supported by PowerInfer. "Time" means the average wall-clock time (us) cost by each step with our sparse GPU operators, and "Speedup" is the speedup ratio to the setting without operators. For reference, the average number of tokens generated by [llama.cpp](https://github.com/ggerganov/llama.cpp) per second is about 3.67 for 7B and 1.92 for 13B
|
123 |
|
124 |
### License Disclaimer
|
125 |
|
|
|
119 |
| **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | 8.67 | 55.29 | 2.38 | 67.50 | 1.68 |
|
120 |
| **ProSparse-13B** | 88.80 | 91.11 | 78.28 | - | 53.78 | 2.44 | 66.73 | 1.70 |
|
121 |
|
122 |
+
**Notes**: Fixed \\(L_1\\) suffers from severe performance degradation. ProSparse with Activation Threshold Shifting is not supported by PowerInfer. "Time" means the average wall-clock time (us) cost by each step with our sparse GPU operators, and "Speedup" is the speedup ratio to the setting without operators. For reference, the average number of tokens generated by [llama.cpp](https://github.com/ggerganov/llama.cpp) per second is about **3.67 for 7B and 1.92 for 13B**. The average time for step (2) and (3) without sparse GPU operators is about **90.55 and 82.92 (us) for 7B, 131.36 and 113.68 (us) for 13B** respectively under all sparsity.
|
123 |
|
124 |
### License Disclaimer
|
125 |
|