Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ This version is trained with `colpali-engine==0.3.7`.
|
|
43 |
We train models use low-rank adapters ([LoRA](https://arxiv.org/abs/2106.09685))
|
44 |
with `alpha=128` and `r=128` on the transformer layers from the language model,
|
45 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
46 |
-
We train on an
|
47 |
|
48 |
## Installation
|
49 |
|
|
|
43 |
We train models use low-rank adapters ([LoRA](https://arxiv.org/abs/2106.09685))
|
44 |
with `alpha=128` and `r=128` on the transformer layers from the language model,
|
45 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
46 |
+
We train on an 8xA100 GPU setup with distributed data parallelism (via accelerate), a learning rate of 2e-4 with linear decay with 1% warmup steps, batch size per device is 32, gradient accumulation steps are 2, in `bfloat16` format
|
47 |
|
48 |
## Installation
|
49 |
|