metadata
license: mit
pipeline_tag: image-text-to-text
library_name: transformers
base_model:
- internlm/internlm2-chat-1_8b
base_model_relation: merge
language:
- multilingual
tags:
- internvl
- vision
- ocr
- custom_code
- moe
Mono-InternVL-2B
This repository contains the instruction-tuned Mono-InternVL-2B model, which has 1.8B activated parameters (3B in total). It is built upon internlm2-chat-1_8b.
Please refer to our paper, project page and GitHub repository for introduction and usage.
Citation
If you find this project useful in your research, please consider citing:
@article{luo2024mono,
title={Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training},
author={Luo, Gen and Yang, Xue and Dou, Wenhan and Wang, Zhaokai and Liu, Jiawen and Dai, Jifeng and Qiao, Yu and Zhu, Xizhou},
journal={arXiv preprint arXiv:2410.08202},
year={2024}
}