OpenGVLab
/

Mono-InternVL-2B

Image-Text-to-Text

feature-extraction

Mixture of Experts

Model card Files Files and versions Community

Mono-InternVL-2B / README.md

wzk1015's picture

Update README.md

1f1c12d verified about 1 month ago

|

history blame contribute delete

1.18 kB

	---
	license: mit
	pipeline_tag: image-text-to-text
	library_name: transformers
	base_model:
	- internlm/internlm2-chat-1_8b
	base_model_relation: merge
	language:
	- multilingual
	tags:
	- internvl
	- vision
	- ocr
	- custom_code
	- moe
	---

	# Mono-InternVL-2B

	This repository contains the instruction-tuned Mono-InternVL-2B model, which has 1.8B activated parameters (3B in total). It is built upon [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b).

	Please refer to our [paper](https://huggingface.co/papers/2410.08202), [project page](https://internvl.github.io/blog/2024-10-10-Mono-InternVL/) and [GitHub repository](https://github.com/OpenGVLab/mono-internvl) for introduction and usage.



	## Citation

	If you find this project useful in your research, please consider citing:

	```BibTeX
	@article{luo2024mono,
	title={Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training},
	author={Luo, Gen and Yang, Xue and Dou, Wenhan and Wang, Zhaokai and Liu, Jiawen and Dai, Jifeng and Qiao, Yu and Zhu, Xizhou},
	journal={arXiv preprint arXiv:2410.08202},
	year={2024}
	}
	```