manaestras asherszhang commited on
Commit
8a81bdc
·
verified ·
1 Parent(s): 714d4e3

update doc (#1)

Browse files

- update doc (b091365afa0b8fc9cf4ad48c8045bce2e3f00477)


Co-authored-by: asher <[email protected]>

Files changed (1) hide show
  1. README.md +8 -27
README.md CHANGED
@@ -147,13 +147,16 @@ We provide a pre-built Docker image containing vLLM 0.8.5 with full support for
147
  https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
148
 
149
  ```
 
150
  docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
151
 
 
 
152
  ```
153
 
154
  - Download Model file:
155
  - Huggingface: will download automicly by vllm.
156
- - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
157
 
158
 
159
  - Start the API server:
@@ -165,7 +168,7 @@ docker run --privileged --user root --net=host --ipc=host \
165
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
166
  \
167
  -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
168
- --tensor-parallel-size 4 --model tencent/Hunyuan-A13B-Instruct --trust-remote-code
169
 
170
  ```
171
 
@@ -174,36 +177,14 @@ model downloaded by modelscope:
174
  docker run --privileged --user root --net=host --ipc=host \
175
  -v ~/.cache/modelscope:/root/.cache/modelscope \
176
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
177
- -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 4 --port 8000 \
178
- --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct/ --trust_remote_code
179
  ```
180
 
181
 
182
  ### SGLang
183
 
184
- #### Docker Image
185
-
186
- We also provide a pre-built Docker image based on the latest version of SGLang.
187
-
188
- To get started:
189
-
190
- - Pull the Docker image
191
-
192
- ```
193
- docker pull tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7
194
- ```
195
-
196
- - Start the API server:
197
-
198
- ```
199
- docker run --gpus all \
200
- --shm-size 32g \
201
- -p 30000:30000 \
202
- --ipc=host \
203
- tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7 \
204
- -m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000
205
- ```
206
-
207
 
208
  ## Contact Us
209
 
 
147
  https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
148
 
149
  ```
150
+ # docker hub:
151
  docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
152
 
153
+ # china mirror
154
+ docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm
155
  ```
156
 
157
  - Download Model file:
158
  - Huggingface: will download automicly by vllm.
159
+ - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4`
160
 
161
 
162
  - Start the API server:
 
168
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
169
  \
170
  -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
171
+ --tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
172
 
173
  ```
174
 
 
177
  docker run --privileged --user root --net=host --ipc=host \
178
  -v ~/.cache/modelscope:/root/.cache/modelscope \
179
  --gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
180
+ -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
181
+ --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
182
  ```
183
 
184
 
185
  ### SGLang
186
 
187
+ Support for INT4 quantization on sglang is in progress and will be available in a future update.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
 
189
  ## Contact Us
190