pipeline_tag: video-text-to-text | |
This repository contains the model of the paper [VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction](https://huggingface.co/papers/2501.01957). | |
Code: https://github.com/VITA-MLLM/VITA |
pipeline_tag: video-text-to-text | |
This repository contains the model of the paper [VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction](https://huggingface.co/papers/2501.01957). | |
Code: https://github.com/VITA-MLLM/VITA |