Merge pull request #3 from TRI-ML/master
Browse files- .gitignore +2 -0
- README.md +34 -4
- interactive_demo.py +3 -3
- serve/gradio_web_server.py +5 -6
.gitignore
CHANGED
|
@@ -103,6 +103,8 @@ celerybeat.pid
|
|
| 103 |
|
| 104 |
# Logs
|
| 105 |
serve_images/
|
|
|
|
|
|
|
| 106 |
|
| 107 |
# Environments
|
| 108 |
.env
|
|
|
|
| 103 |
|
| 104 |
# Logs
|
| 105 |
serve_images/
|
| 106 |
+
*conv.json
|
| 107 |
+
*controller.log*
|
| 108 |
|
| 109 |
# Environments
|
| 110 |
.env
|
README.md
CHANGED
|
@@ -7,7 +7,8 @@ app_file: serve/gradio_web_server.py
|
|
| 7 |
|
| 8 |
# VLM Demo
|
| 9 |
|
| 10 |
-
> *VLM Demo*: Lightweight repo for chatting with
|
|
|
|
| 11 |
|
| 12 |
---
|
| 13 |
|
|
@@ -30,15 +31,21 @@ installed in the current environment. Installation instructions can be found
|
|
| 30 |
The main script to run is `interactive_demo.py`, while the implementation of
|
| 31 |
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
|
| 32 |
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
|
| 33 |
-
adapted from the [LLaVA Github Repo
|
| 34 |
More details on how this code was modified from the original LLaVA repo is provided in the
|
| 35 |
relevant source files.
|
| 36 |
|
| 37 |
-
To run the demo, run the following commands:
|
| 38 |
|
| 39 |
+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
|
| 40 |
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
When running the demo, the following parameters are adjustable:
|
| 44 |
+ Temperature
|
|
@@ -55,6 +62,29 @@ prompt.
|
|
| 55 |
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
|
| 56 |
prompt.
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
## Contributing
|
| 60 |
|
|
|
|
| 7 |
|
| 8 |
# VLM Demo
|
| 9 |
|
| 10 |
+
> *VLM Demo*: Lightweight repo for chatting with VLMs supported by our
|
| 11 |
+
[VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).
|
| 12 |
|
| 13 |
---
|
| 14 |
|
|
|
|
| 31 |
The main script to run is `interactive_demo.py`, while the implementation of
|
| 32 |
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
|
| 33 |
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
|
| 34 |
+
adapted from the [LLaVA Github Repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
|
| 35 |
More details on how this code was modified from the original LLaVA repo is provided in the
|
| 36 |
relevant source files.
|
| 37 |
|
| 38 |
+
To run the demo, first run the following commands in separate terminals:
|
| 39 |
|
| 40 |
+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
|
| 41 |
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
| 42 |
+
|
| 43 |
+
To run the interactive demo, you can specify a model to chat with via a `model_dir` or `model_id` as follows
|
| 44 |
+
|
| 45 |
+
+ `python -m interactive_demo --port 40000 --model_id <MODEL_ID>` OR
|
| 46 |
+
+ `python -m interactive_demo --port 40000 --model_dir <MODEL_DIR>`
|
| 47 |
+
|
| 48 |
+
If you want to chat with multiple models simultaneously, you can launch the `interactive_demo` script in different terminals.
|
| 49 |
|
| 50 |
When running the demo, the following parameters are adjustable:
|
| 51 |
+ Temperature
|
|
|
|
| 62 |
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
|
| 63 |
prompt.
|
| 64 |
|
| 65 |
+
## Example
|
| 66 |
+
|
| 67 |
+
To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals.
|
| 68 |
+
|
| 69 |
+
Launch gradio controller:
|
| 70 |
+
|
| 71 |
+
`python -m serve.controller --host 0.0.0.0 --port 10000`
|
| 72 |
+
|
| 73 |
+
Launch web server:
|
| 74 |
+
|
| 75 |
+
`python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
| 76 |
+
|
| 77 |
+
Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you
|
| 78 |
+
onl need to specify a `model_id`, while for LLaVA and InstructBLIP, you need to additionally specifiy a `model_family`
|
| 79 |
+
and `model_dir`. Note that for each model, a different port must be specified.
|
| 80 |
+
|
| 81 |
+
Launch interactive demo for Prism 7B Model:
|
| 82 |
+
|
| 83 |
+
`python -m interactive_demo --port 40000 --model_id prism-dinosiglip+7b`
|
| 84 |
+
|
| 85 |
+
Launch interactive demo for LLaVA 1.5 7B Model:
|
| 86 |
+
|
| 87 |
+
`python -m interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b`
|
| 88 |
|
| 89 |
## Contributing
|
| 90 |
|
interactive_demo.py
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
"""
|
| 2 |
interactive_demo.py
|
| 3 |
|
| 4 |
-
Entry point for all VLM-
|
| 5 |
|
| 6 |
This file is heavily adapted from the script used to serve models in the LLaVa repo:
|
| 7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/model_worker.py. It is
|
|
@@ -30,8 +30,8 @@ from llava.mm_utils import load_image_from_base64
|
|
| 30 |
from llava.utils import server_error_msg
|
| 31 |
from torchvision.transforms import Compose
|
| 32 |
|
| 33 |
-
from
|
| 34 |
-
from
|
| 35 |
from serve import INTERACTION_MODES_MAP, MODEL_ID_TO_NAME
|
| 36 |
|
| 37 |
GB = 1 << 30
|
|
|
|
| 1 |
"""
|
| 2 |
interactive_demo.py
|
| 3 |
|
| 4 |
+
Entry point for all VLM-Evaluation interactive demos; specify model and get a gradio UI where you can chat with it!
|
| 5 |
|
| 6 |
This file is heavily adapted from the script used to serve models in the LLaVa repo:
|
| 7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/model_worker.py. It is
|
|
|
|
| 30 |
from llava.utils import server_error_msg
|
| 31 |
from torchvision.transforms import Compose
|
| 32 |
|
| 33 |
+
from vlm_eval.models import load_vlm
|
| 34 |
+
from vlm_eval.overwatch import initialize_overwatch
|
| 35 |
from serve import INTERACTION_MODES_MAP, MODEL_ID_TO_NAME
|
| 36 |
|
| 37 |
GB = 1 << 30
|
serve/gradio_web_server.py
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
"""
|
| 2 |
gradio_web_server.py
|
| 3 |
|
| 4 |
-
Entry point for all VLM-
|
| 5 |
|
| 6 |
This file is copied from the script used to define the gradio web server in the LLaVa codebase:
|
| 7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/gradio_web_server.py with only very minor
|
|
@@ -244,9 +244,9 @@ def http_bot(state, model_selector, interaction_mode, temperature, max_new_token
|
|
| 244 |
|
| 245 |
title_markdown = """
|
| 246 |
# Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
|
| 247 |
-
[[
|
| 248 |
-
[[
|
| 249 |
-
| 📚 [[Paper](
|
| 250 |
"""
|
| 251 |
|
| 252 |
tos_markdown = """
|
|
@@ -254,8 +254,7 @@ tos_markdown = """
|
|
| 254 |
By using this service, users are required to agree to the following terms:
|
| 255 |
The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may
|
| 256 |
generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The
|
| 257 |
-
service may collect user dialogue data for future research.
|
| 258 |
-
inappropriate answer! We will collect those to keep improving our moderator. For an optimal experience,
|
| 259 |
please use desktop computers for this demo, as mobile devices may compromise its quality. This website
|
| 260 |
is heavily inspired by the website released by [LLaVA](https://github.com/haotian-liu/LLaVA).
|
| 261 |
"""
|
|
|
|
| 1 |
"""
|
| 2 |
gradio_web_server.py
|
| 3 |
|
| 4 |
+
Entry point for all VLM-Evaluation interactive demos; specify model and get a gradio UI where you can chat with it!
|
| 5 |
|
| 6 |
This file is copied from the script used to define the gradio web server in the LLaVa codebase:
|
| 7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/gradio_web_server.py with only very minor
|
|
|
|
| 244 |
|
| 245 |
title_markdown = """
|
| 246 |
# Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
|
| 247 |
+
[[[Training Code](github.com/TRI-ML/prismatic-vlms)]
|
| 248 |
+
[[[Evaluation Code](github.com/TRI-ML/vlm-evaluation)]
|
| 249 |
+
| 📚 [[Paper](https://arxiv.org/abs/2402.07865)]
|
| 250 |
"""
|
| 251 |
|
| 252 |
tos_markdown = """
|
|
|
|
| 254 |
By using this service, users are required to agree to the following terms:
|
| 255 |
The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may
|
| 256 |
generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The
|
| 257 |
+
service may collect user dialogue data for future research. For an optimal experience,
|
|
|
|
| 258 |
please use desktop computers for this demo, as mobile devices may compromise its quality. This website
|
| 259 |
is heavily inspired by the website released by [LLaVA](https://github.com/haotian-liu/LLaVA).
|
| 260 |
"""
|