File size: 647 Bytes
35089c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
inference: false
datasets:
- liuhaotian/LLaVA-CC3M-Pretrain-595K
---

# llava-v1.5-llama-3-8b-pretrain Model Card

This is a pretrained checkpoint with the MLP connector after LLaVA stage 1, you can use it to instruct tune your multimodal models.
Please follow my reproduced implementation [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3/) for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM.


## Training dataset
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.

## Architecture
- LLM: llama-3-8b (Frozen)
- Vision-Language Adapter: MLP
- Vision Encoder: CLIP-ViT-L-336px (Frozen)