AnyModal
/

Image-Captioning-Llama-3.2-1B

Model card Files Files and versions Community

ritabratamaiti commited on Dec 2, 2024

Commit

b3b84f0

·

verified ·

1 Parent(s): 1ca1d29

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ tags:
 ---
 # AnyModal/Image-Captioning-Llama-3.2-1B
-**AnyModal/Image-Captioning-Llama-3.2-1B** explores the potential of combining advanced visual feature extraction and language modeling techniques to generate descriptive captions for natural images. Built within the [AnyModal](https://github.com/ritabratamaiti/AnyModal) framework, this model integrates a Vision Transformer (ViT) encoder with the Llama 3.2-1B language model, fine-tuned on the Flickr30k dataset. The model demonstrates a promising approach to bridging visual and textual modalities.
 ---

 ---
 # AnyModal/Image-Captioning-Llama-3.2-1B
+**AnyModal/Image-Captioning-Llama-3.2-1B** explores the potential of combining visual feature extraction and language modeling techniques to generate descriptive captions for natural images. Built within the [AnyModal](https://github.com/ritabratamaiti/AnyModal) framework, this model integrates a Vision Transformer (ViT) encoder with the Llama 3.2-1B language model, fine-tuned on the Flickr30k dataset. The model demonstrates a promising approach to bridging visual and textual modalities.
 ---