SOLO Model Card
Model details
Model type: SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.
Model date: SOLO-7B was trained in June 2024.
Paper or resources for more information: Paper & Github
Where to send questions or comments about the model: https://github.com/Yangyi-Chen/SOLO/issues
Inference with Huggingface Please check this scripts for an example of performing inference on the model.
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.