Update README.md
Browse files
README.md
CHANGED
|
@@ -8,8 +8,6 @@ language:
|
|
| 8 |
- es
|
| 9 |
base_model:
|
| 10 |
- MrLight/dse-qwen2-2b-mrl-v1
|
| 11 |
-
datasets:
|
| 12 |
-
- llamaindex/vdr-multilingual-train
|
| 13 |
tags:
|
| 14 |
- transformers
|
| 15 |
- Qwen2-VL
|
|
@@ -19,17 +17,17 @@ tags:
|
|
| 19 |
|
| 20 |

|
| 21 |
|
| 22 |
-
vdr-2b-multi-v1 is a multilingual model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
|
| 23 |
|
| 24 |
|
| 25 |
- **Trained on ๐ฎ๐น Italian, ๐ช๐ธ Spanish, ๐ฌ๐ง English, ๐ซ๐ท French and ๐ฉ๐ช German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
|
| 26 |
|
| 27 |
-
- **Low VRAM and Faster Inference**: english model achieves better results on synthetic vidore benchmarks with just 30% of the base model image resolution. This results in 3x faster inference and much lower VRAM usage.
|
| 28 |
-
|
| 29 |
- **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
|
| 30 |
|
| 31 |
- **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
|
| 32 |
|
|
|
|
|
|
|
| 33 |
# Usage
|
| 34 |
|
| 35 |
**Initialize model and processor**
|
|
@@ -175,6 +173,8 @@ The model is based on [MrLight/dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLig
|
|
| 175 |
|
| 176 |
# Results
|
| 177 |
|
|
|
|
|
|
|
| 178 |
The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
|
| 179 |
|
| 180 |
All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
|
|
@@ -212,4 +212,4 @@ All evaluations are performed by calculating **NDCG@5** scores using **1536 dime
|
|
| 212 |
| | **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai** | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
|
| 213 |
|--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
|
| 214 |
| dse-qwen2-2b-mrl-v1 | 83.6 | 79.8 | **95.7** | **96.9** | **92** | 98.2 | 56.3 | **85.2** | **53.9** | **87.5** | 90.3 |
|
| 215 |
-
| vdr-2b-multi-v1 | **84.0** | **82.4** | 95.5 | 96.5 | 91.2 | **98.5** | **58.5** | 84.7 | 53.6 | 87.1 | **92.2** |
|
|
|
|
| 8 |
- es
|
| 9 |
base_model:
|
| 10 |
- MrLight/dse-qwen2-2b-mrl-v1
|
|
|
|
|
|
|
| 11 |
tags:
|
| 12 |
- transformers
|
| 13 |
- Qwen2-VL
|
|
|
|
| 17 |
|
| 18 |

|
| 19 |
|
| 20 |
+
vdr-2b-multi-v1 is a multilingual embedding model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
|
| 21 |
|
| 22 |
|
| 23 |
- **Trained on ๐ฎ๐น Italian, ๐ช๐ธ Spanish, ๐ฌ๐ง English, ๐ซ๐ท French and ๐ฉ๐ช German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
|
| 24 |
|
|
|
|
|
|
|
| 25 |
- **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
|
| 26 |
|
| 27 |
- **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
|
| 28 |
|
| 29 |
+
To know more about the model, read the [announcement blogpost](https://huggingface.co/blog/marco/vdr-2b-multilingual).
|
| 30 |
+
|
| 31 |
# Usage
|
| 32 |
|
| 33 |
**Initialize model and processor**
|
|
|
|
| 173 |
|
| 174 |
# Results
|
| 175 |
|
| 176 |
+

|
| 177 |
+
|
| 178 |
The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
|
| 179 |
|
| 180 |
All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
|
|
|
|
| 212 |
| | **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai** | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
|
| 213 |
|--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
|
| 214 |
| dse-qwen2-2b-mrl-v1 | 83.6 | 79.8 | **95.7** | **96.9** | **92** | 98.2 | 56.3 | **85.2** | **53.9** | **87.5** | 90.3 |
|
| 215 |
+
| vdr-2b-multi-v1 | **84.0** | **82.4** | 95.5 | 96.5 | 91.2 | **98.5** | **58.5** | 84.7 | 53.6 | 87.1 | **92.2** |
|