ai4colonoscopy
/

ColonGPT

@@ -25,14 +25,14 @@ tags:
 📖 [Paper](https://arxiv.org/abs/2410.17241) | 🏠 [Home](https://github.com/ai4colonoscopy/IntelliScope)
-> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora](https://drive.google.com/drive/folders/1Emi7o7DpN0zlCPIYqsCfNMr9LTPt3SCT?usp=sharing), including vision encoder (siglip) + language model (phi-1.5), and other fine-tuned weights on our ColonINST.
 Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (🤗 [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (🤗 [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.
 # Quick start
-Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.
 - Before running the snippet, you only need to install the following minimium dependencies.
     ```shell
@@ -83,12 +83,12 @@ Here is a code snippet to show you how to quickly try-on our ColonGPT model with
                     return True
             return False
-    prompt = "Describe what you see in the image."
     text = f"USER: <image>\n{prompt} ASSISTANT:"
     text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
     input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)
-    image = Image.open('cache/examples/example2.png')
     image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
     stop_str = "<|endoftext|>"

 📖 [Paper](https://arxiv.org/abs/2410.17241) | 🏠 [Home](https://github.com/ai4colonoscopy/IntelliScope)
+> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora-stg2](https://drive.google.com/file/d/1xAAaVKu16czWO_jgnf-2jCgj2hf14BwM/view?usp=sharing), including vision encoder (siglip) + language model (phi-1.5), and other fine-tuned weights on our ColonINST.
 Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (🤗 [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (🤗 [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.
 # Quick start
+Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. The model focuses on three downstream tasks: image classification (CLS), referring expression generation (REG), and referring expression comprehension (REC). If you need a caption generator, please refer to [ColonGPT-V1-stg1](https://huggingface.co/ai4colonoscopy/ColonGPT-v1-stg1). For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.
 - Before running the snippet, you only need to install the following minimium dependencies.
     ```shell
                     return True
             return False
+    prompt = "Categorize the object."
     text = f"USER: <image>\n{prompt} ASSISTANT:"
     text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
     input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)
+    image = Image.open('/home/projects/u7248002/Project/ColonGPT-tmp/cache/examples/example2.png')
     image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
     stop_str = "<|endoftext|>"