SOLO Model Card

Model details

Model type: SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.

Model date: SOLO-7B was trained in June 2024.

Paper or resources for more information: Paper & Github

Where to send questions or comments about the model: https://github.com/Yangyi-Chen/SOLO/issues

Inference with Huggingface Please check this scripts for an example of performing inference on the model.

Downloads last month
4
Safetensors
Model size
7.26B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.