OpenGVLab
/

Mono-InternVL-2B

Image-Text-to-Text

feature-extraction

Mixture of Experts

Model card Files Files and versions

wzk1015 commited on Oct 10, 2024

Commit

a0ddf17

·

verified ·

1 Parent(s): ad75055

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -173,7 +173,7 @@ def load_image(image_file, input_size=448, max_num=12):
     pixel_values = torch.stack(pixel_values)
     return pixel_values
-# If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
 path = 'OpenGVLab/Mono-InternVL-2B'
 model = AutoModel.from_pretrained(
     path,
@@ -225,6 +225,21 @@ If you find this project useful in your research, please consider citing:
   journal={arXiv preprint arXiv:2410.TODO},
   year={2024}
 }
 ```
@@ -298,4 +313,20 @@ Mono-InternVL在性能上优于当前最先进的MLLM Mini-InternVL-2B-1.5，并
   journal={arXiv preprint arXiv:2410.TODO},
   year={2024}
 }
 ```

     pixel_values = torch.stack(pixel_values)
     return pixel_values
 path = 'OpenGVLab/Mono-InternVL-2B'
 model = AutoModel.from_pretrained(
     path,
   journal={arXiv preprint arXiv:2410.TODO},
   year={2024}
 }
+@article{chen2024far,
+  title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
+  author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
+  journal={arXiv preprint arXiv:2404.16821},
+  year={2024}
+}
+@inproceedings{chen2024internvl,
+  title={Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks},
+  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and others},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={24185--24198},
+  year={2024}
+}
 ```
   journal={arXiv preprint arXiv:2410.TODO},
   year={2024}
 }
+@article{chen2024far,
+  title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
+  author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
+  journal={arXiv preprint arXiv:2404.16821},
+  year={2024}
+}
+@inproceedings{chen2024internvl,
+  title={Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks},
+  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and others},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={24185--24198},
+  year={2024}
+}
 ```