Raincleared commited on
Commit
c94db0e
·
verified ·
1 Parent(s): d298f48

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -108,7 +108,7 @@ where \\(\mathbf{s}\\), \\(\mathbf{x}\\), \\(\mathbf{x}_1\\), and \\(\odot\\) de
108
 
109
  The acceleration effects of LLMs with different sparsity are displayed as follows. ProSparse, which reaches a high sparsity without performance degradation, can gain the most benefits among all the settings concerned. Refer to Section 4.3 of [paper](TODO) for more details.
110
 
111
- | Setting | Average<br>Sparsity | Activation<br>Recall | Predicted<br>Sparsity | PowerInfer<br>Speed | `S2`<br>Time | `S2`<br>Speedup | `S3`<br/>Time | `S3`<br/>Speedup |
112
  | :-------------------: | :-----------------: | :------------------: | :-------------------: | :-----------------: | :--------------: | :-----------------: | :---------------: | :------------------: |
113
  | ReluLLaMA-7B | 66.98 | 90.89 | 58.95 | 11.37 | 67.12 | 1.35 | 63.00 | 1.32 |
114
  | Vanilla ReLU-7B | 66.04 | 87.72 | 72.57 | 12.04 | 67.85 | 1.33 | 63.28 | 1.31 |
@@ -119,7 +119,7 @@ The acceleration effects of LLMs with different sparsity are displayed as follow
119
  | **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | 8.67 | 55.29 | 2.38 | 67.50 | 1.68 |
120
  | **ProSparse-13B** | 88.80 | 91.11 | 78.28 | - | 53.78 | 2.44 | 66.73 | 1.70 |
121
 
122
- **Notes**: Fixed \\(L_1\\) suffers from severe performance degradation. ProSparse with Activation Threshold Shifting is not supported by PowerInfer. "Time" means the average wall-clock time (us) cost by each step with our sparse GPU operators, and "Speedup" is the speedup ratio to the setting without operators. The average time for step (2) and (3) without sparse GPU operators is about **90.55 and 82.92 (us) for 7B, 131.36 and 113.68 (us) for 13B** respectively under all sparsity.
123
 
124
  ### License Disclaimer
125
 
 
108
 
109
  The acceleration effects of LLMs with different sparsity are displayed as follows. ProSparse, which reaches a high sparsity without performance degradation, can gain the most benefits among all the settings concerned. Refer to Section 4.3 of [paper](TODO) for more details.
110
 
111
+ | Setting | Average<br>Sparsity | Activation<br>Recall | Predicted<br>Sparsity | PowerInfer<br>Speed | `S2`<br>Time (\\(\downarrow\\)) | `S2`<br>Speedup | `S3`<br/>Time (\\(\downarrow\\)) | `S3`<br/>Speedup |
112
  | :-------------------: | :-----------------: | :------------------: | :-------------------: | :-----------------: | :--------------: | :-----------------: | :---------------: | :------------------: |
113
  | ReluLLaMA-7B | 66.98 | 90.89 | 58.95 | 11.37 | 67.12 | 1.35 | 63.00 | 1.32 |
114
  | Vanilla ReLU-7B | 66.04 | 87.72 | 72.57 | 12.04 | 67.85 | 1.33 | 63.28 | 1.31 |
 
119
  | **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | 8.67 | 55.29 | 2.38 | 67.50 | 1.68 |
120
  | **ProSparse-13B** | 88.80 | 91.11 | 78.28 | - | 53.78 | 2.44 | 66.73 | 1.70 |
121
 
122
+ **Notes**: Fixed \\(L_1\\) suffers from severe performance degradation. ProSparse with Activation Threshold Shifting is not supported by PowerInfer. "Time" means the average wall-clock time (us) cost by each step with our sparse GPU operators, and "Speedup" is the speedup ratio to the setting without operators. For reference, the average number of tokens generated by llama.cpp per second is about 3.67 for 7B and 1.92 for 13B. The average time for step (2) and (3) without sparse GPU operators is about **90.55 and 82.92 (us) for 7B, 131.36 and 113.68 (us) for 13B** respectively under all sparsity.
123
 
124
  ### License Disclaimer
125