Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ tags:
|
|
16 |
---
|
17 |
# AnyModal/Image-Captioning-Llama-3.2-1B
|
18 |
|
19 |
-
**AnyModal/Image-Captioning-Llama-3.2-1B** explores the potential of combining
|
20 |
|
21 |
---
|
22 |
|
|
|
16 |
---
|
17 |
# AnyModal/Image-Captioning-Llama-3.2-1B
|
18 |
|
19 |
+
**AnyModal/Image-Captioning-Llama-3.2-1B** explores the potential of combining visual feature extraction and language modeling techniques to generate descriptive captions for natural images. Built within the [AnyModal](https://github.com/ritabratamaiti/AnyModal) framework, this model integrates a Vision Transformer (ViT) encoder with the Llama 3.2-1B language model, fine-tuned on the Flickr30k dataset. The model demonstrates a promising approach to bridging visual and textual modalities.
|
20 |
|
21 |
---
|
22 |
|