metadata
license: apache-2.0
Usage
We use FastChat and vllm worker to host the model. Run these following commands in seperate terminals, such as tmux
.
LOGDIR="" python3 -m fastchat.serve.openai_api_server \
--host 0.0.0.0 --port 8080 \
--controller-address http://localhost:21000
LOGDIR="" python3 -m fastchat.serve.controller \
--host 0.0.0.0 --port 21000
LOGDIR="" RAY_LOG_TO_STDERR=1 \
python3 -m fastchat.serve.vllm_worker \
--model-path ./VirtualCompiler \
--num-gpus 8 \
--controller http://localhost:21000 \
--max-num-batched-tokens 40960 \
--disable-log-requests \
--host 0.0.0.0 --port 22000 \
--worker-address http://localhost:22000 \
--model-names "VirtualCompiler"
Then with the model hosted, use do_request.py
to make request to the model.
~/C/VirtualCompiler (main)> python3 do_request.py
test rdx, rdx
setz al
movzx eax, al
neg eax
retn