yydxlv
/

colqwen2.5-7b-v0.1

Visual Document Retrieval

multimodal_embedding

Text-to-Visual Document (T→VD) retrieval

Model card Files Files and versions Community

yydxlv commited on 17 days ago

Commit

54c7735

·

verified ·

1 Parent(s): b4a30e9

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -11,14 +11,12 @@ tags:
 library_name: peft
 pipeline_tag: visual-document-retrieval
 ---
-# ColQwen2.5-7b-v0.1: Multimodal Visual Retriever based on Qwen2.5-VL-7B-Instruct with ColBERT strategy
 ## Ranked #1 among models on the Vidore benchmark (as of February 7, 2025). The reported scores on the [Vidore Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard).
 ### This is the base version trained on 8xA100 80GB with batch_size=32*8 and gradient_accumulation_steps=2 for 3 epoch.
-- **Developed by:** IEIT systems
 ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
 It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
 It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
@@ -131,4 +129,5 @@ If you use this models from this organization in your research, please cite the
   primaryClass={cs.IR},
   url={https://arxiv.org/abs/2407.01449},
 }
-```

 library_name: peft
 pipeline_tag: visual-document-retrieval
 ---
+# IEIT-Systems ColQwen2.5-7b-v0.1: Multimodal Visual Retriever based on Qwen2.5-VL-7B-Instruct with ColBERT strategy
 ## Ranked #1 among models on the Vidore benchmark (as of February 7, 2025). The reported scores on the [Vidore Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard).
 ### This is the base version trained on 8xA100 80GB with batch_size=32*8 and gradient_accumulation_steps=2 for 3 epoch.
 ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
 It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
 It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
   primaryClass={cs.IR},
   url={https://arxiv.org/abs/2407.01449},
 }
+```
+- **Developed by:** IEIT systems