Change example run commands to use new repo name
Browse files
README.md
CHANGED
@@ -171,7 +171,7 @@ docker run \
|
|
171 |
--volume /etc/localtime:/etc/localtime:ro \
|
172 |
-d docker.io/lmsysorg/sglang:v0.4.2-cu124-srt \
|
173 |
python3 -m sglang.launch_server \
|
174 |
-
--model-path root-signals/
|
175 |
--host 0.0.0.0 \
|
176 |
--port 8000 \
|
177 |
--mem-fraction-static 0.89 \
|
@@ -180,7 +180,7 @@ docker run \
|
|
180 |
--disable-cuda-graph
|
181 |
```
|
182 |
|
183 |
-
We validated the model on arm64 with [vLLM](https://github.com/vllm-project/vllm) on Nvidia GH200 as well with max outputs up to
|
184 |
```
|
185 |
docker run \
|
186 |
--gpus all \
|
@@ -189,10 +189,11 @@ docker run \
|
|
189 |
-v huggingface:/root/.cache/huggingface \
|
190 |
--volume /etc/localtime:/etc/localtime:ro \
|
191 |
-d drikster80/vllm-gh200-openai:v0.6.4.post1 \
|
192 |
-
--model root-signals/
|
193 |
-
--gpu-memory-utilization 0.
|
194 |
-
--max-model-len
|
195 |
--block_size 16 \
|
|
|
196 |
```
|
197 |
|
198 |
# 4. Model Details
|
|
|
171 |
--volume /etc/localtime:/etc/localtime:ro \
|
172 |
-d docker.io/lmsysorg/sglang:v0.4.2-cu124-srt \
|
173 |
python3 -m sglang.launch_server \
|
174 |
+
--model-path root-signals/RootSignals-Judge-Llama-70B \
|
175 |
--host 0.0.0.0 \
|
176 |
--port 8000 \
|
177 |
--mem-fraction-static 0.89 \
|
|
|
180 |
--disable-cuda-graph
|
181 |
```
|
182 |
|
183 |
+
We validated the model on arm64 with [vLLM](https://github.com/vllm-project/vllm) on Nvidia GH200 as well with max outputs up to 64k tokens:
|
184 |
```
|
185 |
docker run \
|
186 |
--gpus all \
|
|
|
189 |
-v huggingface:/root/.cache/huggingface \
|
190 |
--volume /etc/localtime:/etc/localtime:ro \
|
191 |
-d drikster80/vllm-gh200-openai:v0.6.4.post1 \
|
192 |
+
--model root-signals/RootSignals-Judge-Llama-70B \
|
193 |
+
--gpu-memory-utilization 0.95 \
|
194 |
+
--max-model-len 64k \
|
195 |
--block_size 16 \
|
196 |
+
--enable_prefix_caching
|
197 |
```
|
198 |
|
199 |
# 4. Model Details
|