metadata

license: apache-2.0
datasets:
  - TIGER-Lab/VisualWebInstruct
language:
  - en
base_model:
  - MAmmoTH-VL/MAmmoTH-VL-8B
pipeline_tag: question-answering
library_name: transformers

Introduction

MAmmoTH-VL2, the model trained with VisualWebInstruct.

Links

Github| Paper| Website

Citation

@article{visualwebinstruct,
    title={VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search},
    author = {Jia, Yiming and Li, Jiachen and Yue, Xiang and Li, Bo and Nie, Ping and Zou, Kai and Chen, Wenhu},
    journal={arXiv preprint arXiv:2503.10582},
    year={2025}
}