YuukiAsuna
/

Vintern-1B-v2-ViTable-docvqa

Document Question Answering

feature-extraction

Model card Files Files and versions Community

Vintern-1B-v2-ViTable-docvqa / README.md

YuukiAsuna's picture

Update usage

47ea6c0 verified 27 days ago

|

history blame contribute delete

1.98 kB

	---
	license: mit
	datasets:
	- YuukiAsuna/VietnameseTableVQA
	language:
	- vi
	base_model:
	- 5CD-AI/Vintern-1B-v2
	pipeline_tag: document-question-answering
	library_name: transformers
	---
	# Vintern-1B-v2-ViTable-docvqa

	<p align="center">
	<a href="https://drive.google.com/file/d/1MU8bgsAwaWWcTl9GN1gXJcSPUSQoyWXy/view?usp=sharing"><b>Report Link</b>👁️</a>
	</p>


	<!-- Provide a quick summary of what the model is/does. -->
	Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)


	## Benchmarks

	<div align="center">

	\| Model \| ANLS \| Semantic Similarity \| MLLM-as-judge (Gemini) \|
	\|------------------------------\|------------------------\|------------------------\|------------------------\|
	\| Gemini 1.5 Flash \| 0.35 \| 0.56 \| 0.40 \|
	\| Vintern-1B-v2 \| 0.04 \| 0.45 \| 0.50 \|
	\| Vintern-1B-v2-ViTable-docvqa \| 0.50 \| 0.71 \| 0.59 \|

	</div>

	<!-- Code benchmark: to be written later -->

	## Usage

	Check out this [🤗 HF Demo](https://huggingface.co/spaces/YuukiAsuna/Vintern-1B-v2-ViTable-docvqa), or you can open it in Colab:
	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ricMh4BxntoiXIT2CnQvAZjrGZTtx4gj?usp=sharing)

	Citation:

	```bibtex
	@misc{doan2024vintern1befficientmultimodallarge,
	title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese},
	author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
	year={2024},
	eprint={2408.12480},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2408.12480},
	}
	```