MagiBoss
/

Blip2-Typhoon1.5-COCO

visual-question-answering

Inference Endpoints

Model card Files Files and versions Community

MagiBoss commited on Oct 18, 2024

Commit

2a312ec

·

1 Parent(s): 9adb6b4

Update README.md

Files changed (1) hide show

README.md +8 -3

README.md CHANGED Viewed

@@ -4,6 +4,11 @@ license: mit
 language:
 - th
 pipeline_tag: image-to-text
 ---
 # Blip2-Typhoon1.5-COCO
@@ -13,7 +18,7 @@ pipeline_tag: image-to-text
 Blip2-Typhoon1.5-COCO is a powerful image captioning model designed to generate descriptive captions for images. This model leverages the strengths of both the BLIP2 and Typhoon architectures to provide high-quality, contextually accurate descriptions. The base models used are:
 - **Encoder**: [Salesforce/blip2-opt-2.7b-coco](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco)
-- **Decoder**: [scb10x/llama-3-typhoon-v1.5x-8b](https://huggingface.co/scb10x/llama-3-typhoon-v1.5x-8b)
 The BLIP2 encoder extracts visual features from images, while the Typhoon decoder generates natural language descriptions based on these features.
@@ -25,7 +30,7 @@ This model was trained on the COCO 2017 dataset, a widely-used benchmark dataset
 - **Datasets**: COCO 2017
 - **Encoder**: Salesforce/blip2-opt-2.7b-coco
-- **Decoder**: scb10x/llama-3-typhoon-v1.5x-8b
 - **Training Framework**: [Hugging Face Transformers](https://huggingface.co/transformers/)
 - **Hardware**: High-performance GPUs for efficient training
@@ -72,4 +77,4 @@ If you use this model in your research, please cite:
   publisher = {Hugging Face},
   note = {https://huggingface.co/MagiBoss/Blip2-Typhoon1.5-COCO}
 }
-```

 language:
 - th
 pipeline_tag: image-to-text
+datasets:
+- MagiBoss/COCO-Image-Captioning
+base_model:
+- Salesforce/blip2-opt-2.7b-coco
+- scb10x/llama-3-typhoon-v1.5-8b
 ---
 # Blip2-Typhoon1.5-COCO
 Blip2-Typhoon1.5-COCO is a powerful image captioning model designed to generate descriptive captions for images. This model leverages the strengths of both the BLIP2 and Typhoon architectures to provide high-quality, contextually accurate descriptions. The base models used are:
 - **Encoder**: [Salesforce/blip2-opt-2.7b-coco](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco)
+- **Decoder**: [scb10x/llama-3-typhoon-v1.5-8b](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b)
 The BLIP2 encoder extracts visual features from images, while the Typhoon decoder generates natural language descriptions based on these features.
 - **Datasets**: COCO 2017
 - **Encoder**: Salesforce/blip2-opt-2.7b-coco
+- **Decoder**: scb10x/llama-3-typhoon-v1.5-8b
 - **Training Framework**: [Hugging Face Transformers](https://huggingface.co/transformers/)
 - **Hardware**: High-performance GPUs for efficient training
   publisher = {Hugging Face},
   note = {https://huggingface.co/MagiBoss/Blip2-Typhoon1.5-COCO}
 }
+```