Metric-AI
/

colqwen2.5-3b-multilingual

Visual Document Retrieval

multimodal_embedding

multilingual_embedding

Text-to-Visual Document (T→VD) retrieval

Model card Files Files and versions Community

Markgazol commited on 13 days ago

Commit

ac6ec48

·

verified ·

1 Parent(s): aa07dd8

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -23,8 +23,9 @@ library_name: peft
 ---
 # ColQwen2.5-3b-multilingual: Multilingual Visual Retriever based on Qwen2.5-VL-3B-Instruct with ColBERT strategy
 ### This is the base version trained on 4xA100 80GB with per_device_batch_size=128 and gradient_accumulation_steps=2 for 5 epoch.
-### The reported scores are for the "checkpoint-1800".
 ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
 It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.

 ---
 # ColQwen2.5-3b-multilingual: Multilingual Visual Retriever based on Qwen2.5-VL-3B-Instruct with ColBERT strategy
+## Ranked #1 among models smaller than 7B parameters and #3 overall on the Vidore benchmark (as of February 2, 2025). The reported scores on the [Vidore Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard) correspond to checkpoint-1800.
 ### This is the base version trained on 4xA100 80GB with per_device_batch_size=128 and gradient_accumulation_steps=2 for 5 epoch.
 ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
 It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.