update doc
Browse files
README.md
CHANGED
@@ -147,13 +147,16 @@ We provide a pre-built Docker image containing vLLM 0.8.5 with full support for
|
|
147 |
https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
|
148 |
|
149 |
```
|
|
|
150 |
docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
151 |
|
|
|
|
|
152 |
```
|
153 |
|
154 |
- Download Model file:
|
155 |
- Huggingface: will download automicly by vllm.
|
156 |
-
- ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
|
157 |
|
158 |
|
159 |
- Start the API server:
|
@@ -165,7 +168,7 @@ docker run --privileged --user root --net=host --ipc=host \
|
|
165 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
166 |
\
|
167 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
168 |
-
--tensor-parallel-size
|
169 |
|
170 |
```
|
171 |
|
@@ -174,36 +177,14 @@ model downloaded by modelscope:
|
|
174 |
docker run --privileged --user root --net=host --ipc=host \
|
175 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
176 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
177 |
-
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size
|
178 |
-
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct/ --trust_remote_code
|
179 |
```
|
180 |
|
181 |
|
182 |
### SGLang
|
183 |
|
184 |
-
|
185 |
-
|
186 |
-
We also provide a pre-built Docker image based on the latest version of SGLang.
|
187 |
-
|
188 |
-
To get started:
|
189 |
-
|
190 |
-
- Pull the Docker image
|
191 |
-
|
192 |
-
```
|
193 |
-
docker pull tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7
|
194 |
-
```
|
195 |
-
|
196 |
-
- Start the API server:
|
197 |
-
|
198 |
-
```
|
199 |
-
docker run --gpus all \
|
200 |
-
--shm-size 32g \
|
201 |
-
-p 30000:30000 \
|
202 |
-
--ipc=host \
|
203 |
-
tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7 \
|
204 |
-
-m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000
|
205 |
-
```
|
206 |
-
|
207 |
|
208 |
## Contact Us
|
209 |
|
|
|
147 |
https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
|
148 |
|
149 |
```
|
150 |
+
# docker hub:
|
151 |
docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
152 |
|
153 |
+
# china mirror
|
154 |
+
docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
155 |
```
|
156 |
|
157 |
- Download Model file:
|
158 |
- Huggingface: will download automicly by vllm.
|
159 |
+
- ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4`
|
160 |
|
161 |
|
162 |
- Start the API server:
|
|
|
168 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
169 |
\
|
170 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
171 |
+
--tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
|
172 |
|
173 |
```
|
174 |
|
|
|
177 |
docker run --privileged --user root --net=host --ipc=host \
|
178 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
179 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
180 |
+
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
|
181 |
+
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
|
182 |
```
|
183 |
|
184 |
|
185 |
### SGLang
|
186 |
|
187 |
+
Support for INT4 quantization on sglang is in progress and will be available in a future update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
188 |
|
189 |
## Contact Us
|
190 |
|