File size: 1,980 Bytes
f8b2073
 
 
 
 
 
 
 
 
 
 
deea30f
 
 
 
 
 
f8b2073
 
 
 
 
 
 
deea30f
 
b45901e
 
 
 
 
deea30f
 
 
 
f8b2073
47ea6c0
f8b2073
47ea6c0
 
f8b2073
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: mit
datasets:
- YuukiAsuna/VietnameseTableVQA
language:
- vi
base_model:
- 5CD-AI/Vintern-1B-v2
pipeline_tag: document-question-answering
library_name: transformers
---
# Vintern-1B-v2-ViTable-docvqa

<p align="center">
  <a href="https://drive.google.com/file/d/1MU8bgsAwaWWcTl9GN1gXJcSPUSQoyWXy/view?usp=sharing"><b>Report Link</b>👁️</a>
</p>


<!-- Provide a quick summary of what the model is/does. -->
Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)


## Benchmarks

<div align="center">

| Model                        | ANLS                   | Semantic Similarity    | MLLM-as-judge (Gemini) |
|------------------------------|------------------------|------------------------|------------------------|
| Gemini 1.5 Flash             | 0.35                   | 0.56                   | 0.40                   |
| Vintern-1B-v2                | 0.04                   | 0.45                   | 0.50                   |
| Vintern-1B-v2-ViTable-docvqa | **0.50**               | **0.71**               | **0.59**               |

</div>

<!-- Code benchmark: to be written later -->

## Usage

Check out this [**🤗 HF Demo**](https://huggingface.co/spaces/YuukiAsuna/Vintern-1B-v2-ViTable-docvqa), or you can open it in Colab:  
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ricMh4BxntoiXIT2CnQvAZjrGZTtx4gj?usp=sharing)

**Citation:**

```bibtex
@misc{doan2024vintern1befficientmultimodallarge,
      title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese}, 
      author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
      year={2024},
      eprint={2408.12480},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.12480}, 
}
```