Metric-AI
/

ColQwen2.5-7b-multilingual-v1.0

Visual Document Retrieval

multimodal_embedding

multilingual_embedding

Text-to-Visual Document (T→VD) retrieval

Model card Files Files and versions Community

nothuggingfaceatall commited on Feb 11

Commit

0acde4b

·

verified ·

1 Parent(s): e3a66a7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ pipeline_tag: visual-document-retrieval
 ## Ranked #1 on the Vidore benchmark (as of February 11, 2025). The reported scores are on the [Vidore Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard).
-### This is the base version trained on 4xA100 80GB with per_device_batch_size=128 and gradient_accumulation_steps=2 for 5 epoch.
 ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
 It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.

 ## Ranked #1 on the Vidore benchmark (as of February 11, 2025). The reported scores are on the [Vidore Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard).
+### This is the base version trained on 4xA100 80GB with per_device_batch_size=64 and gradient_accumulation_steps=2 for 5 epoch.
 ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
 It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.