--- base_model: - MAmmoTH-VL/MAmmoTH-VL-8B datasets: - TIGER-Lab/VisualWebInstruct language: - en library_name: transformers license: apache-2.0 pipeline_tag: image-text-to-text --- # Introduction MAmmoTH-VL2, the model trained with VisualWebInstruct. # Links [Github](https://github.com/TIGER-AI-Lab/VisualWebInstruct)| [Paper](https://arxiv.org/abs/2503.10582)| [Website](https://tiger-ai-lab.github.io/VisualWebInstruct/) # Citation ``` @article{visualwebinstruct, title={VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search}, author = {Jia, Yiming and Li, Jiachen and Yue, Xiang and Li, Bo and Nie, Ping and Zou, Kai and Chen, Wenhu}, journal={arXiv preprint arXiv:2503.10582}, year={2025} } ```