ritabratamaiti commited on
Commit
b3b84f0
·
verified ·
1 Parent(s): 1ca1d29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ tags:
16
  ---
17
  # AnyModal/Image-Captioning-Llama-3.2-1B
18
 
19
- **AnyModal/Image-Captioning-Llama-3.2-1B** explores the potential of combining advanced visual feature extraction and language modeling techniques to generate descriptive captions for natural images. Built within the [AnyModal](https://github.com/ritabratamaiti/AnyModal) framework, this model integrates a Vision Transformer (ViT) encoder with the Llama 3.2-1B language model, fine-tuned on the Flickr30k dataset. The model demonstrates a promising approach to bridging visual and textual modalities.
20
 
21
  ---
22
 
 
16
  ---
17
  # AnyModal/Image-Captioning-Llama-3.2-1B
18
 
19
+ **AnyModal/Image-Captioning-Llama-3.2-1B** explores the potential of combining visual feature extraction and language modeling techniques to generate descriptive captions for natural images. Built within the [AnyModal](https://github.com/ritabratamaiti/AnyModal) framework, this model integrates a Vision Transformer (ViT) encoder with the Llama 3.2-1B language model, fine-tuned on the Flickr30k dataset. The model demonstrates a promising approach to bridging visual and textual modalities.
20
 
21
  ---
22