Update README.md
Browse files
README.md
CHANGED
@@ -11,14 +11,12 @@ tags:
|
|
11 |
library_name: peft
|
12 |
pipeline_tag: visual-document-retrieval
|
13 |
---
|
14 |
-
# ColQwen2.5-7b-v0.1: Multimodal Visual Retriever based on Qwen2.5-VL-7B-Instruct with ColBERT strategy
|
15 |
|
16 |
## Ranked #1 among models on the Vidore benchmark (as of February 7, 2025). The reported scores on the [Vidore Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard).
|
17 |
|
18 |
### This is the base version trained on 8xA100 80GB with batch_size=32*8 and gradient_accumulation_steps=2 for 3 epoch.
|
19 |
|
20 |
-
- **Developed by:** IEIT systems
|
21 |
-
|
22 |
ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
|
23 |
It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
|
24 |
It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
|
@@ -131,4 +129,5 @@ If you use this models from this organization in your research, please cite the
|
|
131 |
primaryClass={cs.IR},
|
132 |
url={https://arxiv.org/abs/2407.01449},
|
133 |
}
|
134 |
-
```
|
|
|
|
11 |
library_name: peft
|
12 |
pipeline_tag: visual-document-retrieval
|
13 |
---
|
14 |
+
# IEIT-Systems ColQwen2.5-7b-v0.1: Multimodal Visual Retriever based on Qwen2.5-VL-7B-Instruct with ColBERT strategy
|
15 |
|
16 |
## Ranked #1 among models on the Vidore benchmark (as of February 7, 2025). The reported scores on the [Vidore Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard).
|
17 |
|
18 |
### This is the base version trained on 8xA100 80GB with batch_size=32*8 and gradient_accumulation_steps=2 for 3 epoch.
|
19 |
|
|
|
|
|
20 |
ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
|
21 |
It is a [Qwen2.5-VL-3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
|
22 |
It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
|
|
|
129 |
primaryClass={cs.IR},
|
130 |
url={https://arxiv.org/abs/2407.01449},
|
131 |
}
|
132 |
+
```
|
133 |
+
- **Developed by:** IEIT systems
|