Upload README.md
Browse files
README.md
CHANGED
@@ -25,14 +25,14 @@ tags:
|
|
25 |
|
26 |
π [Paper](https://arxiv.org/abs/2410.17241) | π [Home](https://github.com/ai4colonoscopy/IntelliScope)
|
27 |
|
28 |
-
> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora](https://drive.google.com/
|
29 |
|
30 |
Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (π€ [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (π€ [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.
|
31 |
|
32 |
|
33 |
# Quick start
|
34 |
|
35 |
-
Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.
|
36 |
|
37 |
- Before running the snippet, you only need to install the following minimium dependencies.
|
38 |
```shell
|
@@ -83,12 +83,12 @@ Here is a code snippet to show you how to quickly try-on our ColonGPT model with
|
|
83 |
return True
|
84 |
return False
|
85 |
|
86 |
-
prompt = "
|
87 |
text = f"USER: <image>\n{prompt} ASSISTANT:"
|
88 |
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
|
89 |
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)
|
90 |
|
91 |
-
image = Image.open('cache/examples/example2.png')
|
92 |
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
|
93 |
|
94 |
stop_str = "<|endoftext|>"
|
|
|
25 |
|
26 |
π [Paper](https://arxiv.org/abs/2410.17241) | π [Home](https://github.com/ai4colonoscopy/IntelliScope)
|
27 |
|
28 |
+
> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora-stg2](https://drive.google.com/file/d/1xAAaVKu16czWO_jgnf-2jCgj2hf14BwM/view?usp=sharing), including vision encoder (siglip) + language model (phi-1.5), and other fine-tuned weights on our ColonINST.
|
29 |
|
30 |
Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (π€ [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (π€ [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.
|
31 |
|
32 |
|
33 |
# Quick start
|
34 |
|
35 |
+
Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. The model focuses on three downstream tasks: image classification (CLS), referring expression generation (REG), and referring expression comprehension (REC). If you need a caption generator, please refer to [ColonGPT-V1-stg1](https://huggingface.co/ai4colonoscopy/ColonGPT-v1-stg1). For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.
|
36 |
|
37 |
- Before running the snippet, you only need to install the following minimium dependencies.
|
38 |
```shell
|
|
|
83 |
return True
|
84 |
return False
|
85 |
|
86 |
+
prompt = "Categorize the object."
|
87 |
text = f"USER: <image>\n{prompt} ASSISTANT:"
|
88 |
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
|
89 |
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)
|
90 |
|
91 |
+
image = Image.open('/home/projects/u7248002/Project/ColonGPT-tmp/cache/examples/example2.png')
|
92 |
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
|
93 |
|
94 |
stop_str = "<|endoftext|>"
|