|
--- |
|
base_model: |
|
- MAmmoTH-VL/MAmmoTH-VL-8B |
|
datasets: |
|
- TIGER-Lab/VisualWebInstruct |
|
language: |
|
- en |
|
library_name: transformers |
|
license: apache-2.0 |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
# Introduction |
|
MAmmoTH-VL2, the model trained with VisualWebInstruct. |
|
|
|
# Links |
|
[Github](https://github.com/TIGER-AI-Lab/VisualWebInstruct)| |
|
[Paper](https://arxiv.org/abs/2503.10582)| |
|
[Website](https://tiger-ai-lab.github.io/VisualWebInstruct/) |
|
|
|
# Citation |
|
``` |
|
@article{visualwebinstruct, |
|
title={VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search}, |
|
author = {Jia, Yiming and Li, Jiachen and Yue, Xiang and Li, Bo and Nie, Ping and Zou, Kai and Chen, Wenhu}, |
|
journal={arXiv preprint arXiv:2503.10582}, |
|
year={2025} |
|
} |
|
``` |