Spaces:

TRI-ML
/

vlm-demo

Paused

App Files Files Community

mattb512 commited on Feb 13, 2024

Commit

a3a3fdb

unverified ·

2 Parent(s): 1db133b e71c8dc

Merge pull request #3 from TRI-ML/master

Browse files

Files changed (4) hide show

.gitignore +2 -0
README.md +34 -4
interactive_demo.py +3 -3
serve/gradio_web_server.py +5 -6

.gitignore CHANGED Viewed

@@ -103,6 +103,8 @@ celerybeat.pid
 # Logs
 serve_images/
 # Environments
 .env

 # Logs
 serve_images/
+*conv.json
+*controller.log*
 # Environments
 .env

README.md CHANGED Viewed

@@ -7,7 +7,8 @@ app_file: serve/gradio_web_server.py
 # VLM Demo
-> *VLM Demo*: Lightweight repo for chatting with models loaded into *VLM Bench*.
 ---
@@ -30,15 +31,21 @@ installed in the current environment. Installation instructions can be found
 The main script to run is `interactive_demo.py`, while the implementation of
 the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
 (`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
-adapted from the [LLaVA Github Repo:](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
 More details on how this code was modified from the original LLaVA repo is provided in the
 relevant source files.
-To run the demo, run the following commands:
 + Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
 + Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
-+ Run interactive demo: `CUDA_VISIBLE_DEVICES=0 python -m interactive_demo  --port 40000  --model_dir <PATH TO MODEL CKPT>`
 When running the demo, the following parameters are adjustable:
 + Temperature
@@ -55,6 +62,29 @@ prompt.
 + True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
 prompt.
 ## Contributing

 # VLM Demo
+> *VLM Demo*: Lightweight repo for chatting with VLMs supported by our
+[VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).
 ---
 The main script to run is `interactive_demo.py`, while the implementation of
 the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
 (`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
+adapted from the [LLaVA Github Repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
 More details on how this code was modified from the original LLaVA repo is provided in the
 relevant source files.
+To run the demo, first run the following commands in separate terminals:
 + Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
 + Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
+To run the interactive demo, you can specify a model to chat with via a `model_dir` or `model_id` as follows
++ `python -m interactive_demo  --port 40000  --model_id <MODEL_ID>` OR
++ `python -m interactive_demo  --port 40000  --model_dir <MODEL_DIR>`
+If you want to chat with multiple models simultaneously, you can launch the `interactive_demo` script in different terminals.
 When running the demo, the following parameters are adjustable:
 + Temperature
 + True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
 prompt.
+## Example
+To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals.
+Launch gradio controller:
+`python -m serve.controller --host 0.0.0.0 --port 10000`
+Launch web server:
+`python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
+Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you
+onl need to specify a `model_id`, while for LLaVA and InstructBLIP, you need to additionally specifiy a `model_family`
+and `model_dir`. Note that for each model, a different port must be specified.
+Launch interactive demo for Prism 7B Model:
+`python -m interactive_demo --port 40000 --model_id prism-dinosiglip+7b`
+Launch interactive demo for LLaVA 1.5 7B Model:
+`python -m interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b`
 ## Contributing

interactive_demo.py CHANGED Viewed

@@ -1,7 +1,7 @@
 """
 interactive_demo.py
-Entry point for all VLM-Bench interactive demos; specify model and get a gradio UI where you can chat with it!
 This file is heavily adapted from the script used to serve models in the LLaVa repo:
 https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/model_worker.py. It is
@@ -30,8 +30,8 @@ from llava.mm_utils import load_image_from_base64
 from llava.utils import server_error_msg
 from torchvision.transforms import Compose
-from vlbench.models import load_vlm
-from vlbench.overwatch import initialize_overwatch
 from serve import INTERACTION_MODES_MAP, MODEL_ID_TO_NAME
 GB = 1 << 30

 """
 interactive_demo.py
+Entry point for all VLM-Evaluation interactive demos; specify model and get a gradio UI where you can chat with it!
 This file is heavily adapted from the script used to serve models in the LLaVa repo:
 https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/model_worker.py. It is
 from llava.utils import server_error_msg
 from torchvision.transforms import Compose
+from vlm_eval.models import load_vlm
+from vlm_eval.overwatch import initialize_overwatch
 from serve import INTERACTION_MODES_MAP, MODEL_ID_TO_NAME
 GB = 1 << 30

serve/gradio_web_server.py CHANGED Viewed

@@ -1,7 +1,7 @@
 """
 gradio_web_server.py
-Entry point for all VLM-Bench interactive demos; specify model and get a gradio UI where you can chat with it!
 This file is copied from the script used to define the gradio web server in the LLaVa codebase:
 https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/gradio_web_server.py with only very minor
@@ -244,9 +244,9 @@ def http_bot(state, model_selector, interaction_mode, temperature, max_new_token
 title_markdown = """
 # Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
-[[Project Page](TODO)] [[Code](TODO)]
-[[Models](TODO)]
-| 📚 [[Paper](TODO)]
 """
 tos_markdown = """
@@ -254,8 +254,7 @@ tos_markdown = """
 By using this service, users are required to agree to the following terms:
 The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may
 generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The
-service may collect user dialogue data for future research. Please click the "Flag" button if you get any
-inappropriate answer! We will collect those to keep improving our moderator. For an optimal experience,
 please use desktop computers for this demo, as mobile devices may compromise its quality. This website
 is heavily inspired by the website released by [LLaVA](https://github.com/haotian-liu/LLaVA).
 """

 """
 gradio_web_server.py
+Entry point for all VLM-Evaluation interactive demos; specify model and get a gradio UI where you can chat with it!
 This file is copied from the script used to define the gradio web server in the LLaVa codebase:
 https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/gradio_web_server.py with only very minor
 title_markdown = """
 # Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
+[[[Training Code](github.com/TRI-ML/prismatic-vlms)]
+[[[Evaluation Code](github.com/TRI-ML/vlm-evaluation)]
+| 📚 [[Paper](https://arxiv.org/abs/2402.07865)]
 """
 tos_markdown = """
 By using this service, users are required to agree to the following terms:
 The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may
 generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The
+service may collect user dialogue data for future research. For an optimal experience,
 please use desktop computers for this demo, as mobile devices may compromise its quality. This website
 is heavily inspired by the website released by [LLaVA](https://github.com/haotian-liu/LLaVA).
 """