|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
language_model.model.layers.0 4 |
|
language_model.model.layers.1 4 |
|
language_model.model.layers.2 4 |
|
language_model.model.layers.3 4 |
|
language_model.model.layers.4 4 |
|
language_model.model.layers.5 4 |
|
language_model.model.layers.6 4 |
|
language_model.model.layers.7 4 |
|
language_model.model.layers.8 4 |
|
language_model.model.layers.9 4 |
|
language_model.model.layers.10 4 |
|
language_model.model.layers.11 4 |
|
language_model.model.layers.12 4 |
|
language_model.model.layers.13 4 |
|
language_model.model.layers.14 4 |
|
language_model.model.layers.15 4 |
|
language_model.model.layers.16 4 |
|
language_model.model.layers.17 4 |
|
language_model.model.layers.18 4 |
|
language_model.model.layers.19 4 |
|
language_model.model.layers.20 4 |
|
language_model.model.layers.21 4 |
|
language_model.model.layers.22 4 |
|
language_model.model.layers.23 4 |
|
vision_model.encoder.layers.0 0 |
|
vision_model.encoder.layers.1 0 |
|
vision_model.encoder.layers.2 0 |
|
vision_model.encoder.layers.3 0 |
|
vision_model.encoder.layers.4 0 |
|
vision_model.encoder.layers.5 0 |
|
vision_model.encoder.layers.6 0 |
|
vision_model.encoder.layers.7 0 |
|
vision_model.encoder.layers.8 0 |
|
vision_model.encoder.layers.9 0 |
|
vision_model.encoder.layers.10 0 |
|
vision_model.encoder.layers.11 0 |
|
vision_model.encoder.layers.12 0 |
|
vision_model.encoder.layers.13 0 |
|
vision_model.encoder.layers.14 0 |
|
vision_model.encoder.layers.15 0 |
|
vision_model.encoder.layers.16 0 |
|
vision_model.encoder.layers.17 0 |
|
vision_model.encoder.layers.18 0 |
|
vision_model.encoder.layers.19 0 |
|
vision_model.encoder.layers.20 0 |
|
vision_model.encoder.layers.21 0 |
|
vision_model.encoder.layers.22 0 |
|
vision_model.encoder.layers.23 0 |
|
vision_model.embeddings 0 |
|
mlp1 0 |
|
language_model.model.tok_embeddings 4 |
|
language_model.model.norm 4 |
|
language_model.output 4 |
|
language_model.model.embed_tokens 4 |
|
language_model.lm_head 4 |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [3] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ImageNeedleInAHaystack, devices: {device(type='cuda', index=3), device(type='cuda', index=7)} |
|
Initialization Finished |
|
Predicting ImageNeedleInAHaystack Using internvl |
|
Proceeding 2-length images samples | Num: 10 |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [0] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ImageNeedleInAHaystack, devices: {device(type='cuda', index=0), device(type='cuda', index=4)} |
|
Rank [1] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ImageNeedleInAHaystack, devices: {device(type='cuda', index=1), device(type='cuda', index=5)} |
|
Initialization Finished |
|
Predicting ImageNeedleInAHaystack Using internvl |
|
Proceeding 2-length images samples | Num: 10 |
|
Initialization Finished |
|
Predicting ImageNeedleInAHaystack Using internvl |
|
Proceeding 2-length images samples | Num: 10 |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [2] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ImageNeedleInAHaystack, devices: {device(type='cuda', index=2), device(type='cuda', index=6)} |
|
Initialization Finished |
|
Predicting ImageNeedleInAHaystack Using internvl |
|
Proceeding 2-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.66s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.02it/s]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.11s/it] |
|
Proceeding 4-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.65s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.02it/s]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.11s/it] |
|
Proceeding 4-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:03, 1.81s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.06s/it]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.26it/s]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.03it/s] |
|
Proceeding 4-length images samples | Num: 10 |
|
Proceeding 4-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:00<00:00, 1.12it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.39it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.25it/s] |
|
Proceeding 6-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:00<00:00, 1.08it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.28it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.17it/s] |
|
Proceeding 6-length images samples | Num: 10 |
|
Proceeding 6-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:00<00:01, 1.02it/s]
67%|βββββββ | 2/3 [00:01<00:00, 1.38it/s]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.41it/s]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.30it/s] |
|
Proceeding 6-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.07s/it]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.22it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.11it/s] |
|
Proceeding 8-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.11s/it]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.18it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.08it/s] |
|
Proceeding 8-length images samples | Num: 10 |
|
Proceeding 8-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.12s/it]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.14it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.05it/s] |
|
Proceeding 10-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.22s/it]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.13it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.01it/s] |
|
Proceeding 10-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.02s/it]
67%|βββββββ | 2/3 [00:01<00:00, 1.14it/s]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.30it/s]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.18it/s] |
|
Proceeding 8-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.43s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.35s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.43s/it] |
|
Proceeding 12-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.26s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.30s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.35s/it] |
|
Proceeding 12-length images samples | Num: 10 |
|
Proceeding 10-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.24s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.31s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.08s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.16s/it] |
|
Proceeding 10-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.42s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.03s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.13s/it] |
|
Proceeding 14-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.35s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.16s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.23s/it] |
|
Proceeding 14-length images samples | Num: 10 |
|
Proceeding 12-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.27s/it]
67%|βββββββ | 2/3 [00:02<00:00, 1.03it/s]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.46s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.39s/it] |
|
Proceeding 12-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.15s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.41s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.57s/it] |
|
Proceeding 16-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.07s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.56s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.69s/it] |
|
Proceeding 16-length images samples | Num: 10 |
|
Proceeding 14-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.15s/it]
67%|βββββββ | 2/3 [00:01<00:00, 1.07it/s]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.16it/s]
100%|ββββββββββ| 3/3 [00:02<00:00, 1.06it/s] |
|
Proceeding 14-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.38s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.11s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.21s/it] |
|
Proceeding 18-length images samples | Num: 10 |
|
Proceeding 18-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.36s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.14s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.22s/it] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.42s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.16s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.24s/it] |
|
Proceeding 20-length images samples | Num: 10 |
|
Proceeding 16-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.51s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.22s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.30s/it] |
|
Proceeding 20-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.23s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.13s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.04s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.11s/it] |
|
Proceeding 16-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.45s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.28s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.35s/it] |
|
Proceeding 22-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.35s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.22s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.29s/it] |
|
Proceeding 22-length images samples | Num: 10 |
|
Proceeding 18-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.33s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.11s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.07s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.13s/it] |
|
Proceeding 18-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.46s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.30s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.37s/it] |
|
Proceeding 24-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.54s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.35s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.42s/it] |
|
Proceeding 24-length images samples | Num: 10 |
|
Proceeding 20-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.37s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.18s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.20s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.25s/it] |
|
Proceeding 20-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.41s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.32s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.37s/it] |
|
Proceeding 26-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.68s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.43s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.51s/it] |
|
Proceeding 26-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.56s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.40s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.47s/it] |
|
Proceeding 28-length images samples | Num: 10 |
|
Proceeding 22-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.44s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.25s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.16s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.23s/it] |
|
Proceeding 22-length images samples | Num: 10 |
|
Proceeding 28-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.78s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.45s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.56s/it] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.65s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.47s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.53s/it] |
|
Proceeding 30-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.66s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.49s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.56s/it] |
|
Proceeding 30-length images samples | Num: 10 |
|
Proceeding 24-length images samples | Num: 10 |
|
Proceeding 24-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:03, 1.68s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.40s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.33s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.40s/it] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.88s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.91s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.96s/it] |
|
Proceeding 32-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.86s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.60s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.69s/it] |
|
Proceeding 32-length images samples | Num: 10 |
|
Proceeding 26-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:03, 1.81s/it]
67%|βββββββ | 2/3 [00:03<00:01, 1.78s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.73s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.79s/it] |
|
Proceeding 26-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.01s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.78s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.86s/it] |
|
Proceeding 34-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.94s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.66s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.75s/it] |
|
Proceeding 34-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.94s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.70s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.77s/it] |
|
Proceeding 36-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:03, 1.63s/it]
67%|βββββββ | 2/3 [00:03<00:01, 1.55s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.64s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.66s/it] |
|
Proceeding 28-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.02s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.98s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.05s/it] |
|
Proceeding 36-length images samples | Num: 10 |
|
Proceeding 28-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.00s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.05s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.09s/it] |
|
Proceeding 38-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.45s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.71s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.71s/it] |
|
Proceeding 38-length images samples | Num: 10 |
|
Proceeding 30-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.25s/it]
67%|βββββββ | 2/3 [00:06<00:03, 3.33s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.44s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.72s/it] |
|
Proceeding 30-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.93s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.04s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.06s/it] |
|
Proceeding 40-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.63s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.38s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.61s/it] |
|
Proceeding 40-length images samples | Num: 10 |
|
Proceeding 32-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.36s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.77s/it]
100%|ββββββββββ| 3/3 [00:09<00:00, 3.34s/it]
100%|ββββββββββ| 3/3 [00:09<00:00, 3.28s/it] |
|
Proceeding 32-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.96s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 4.03s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.90s/it] |
|
Proceeding 42-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.63s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 2.92s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.23s/it] |
|
Proceeding 42-length images samples | Num: 10 |
|
Proceeding 34-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.22s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.06s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.14s/it] |
|
Proceeding 44-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:04, 2.05s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.08s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.81s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.90s/it] |
|
Proceeding 34-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.24s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 3.13s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.04s/it] |
|
Proceeding 44-length images samples | Num: 10 |
|
Proceeding 36-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.12s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.93s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.00s/it] |
|
Proceeding 46-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:07, 3.70s/it]
67%|βββββββ | 2/3 [00:06<00:03, 3.42s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.55s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.84s/it] |
|
Proceeding 36-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.60s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.17s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.28s/it] |
|
Proceeding 46-length images samples | Num: 10 |
|
Proceeding 48-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.08s/it]
100%|ββββββββββ| 2/2 [00:10<00:00, 5.53s/it]
100%|ββββββββββ| 2/2 [00:10<00:00, 5.35s/it] |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.46s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.74s/it]
100%|ββββββββββ| 3/3 [00:10<00:00, 3.47s/it]
100%|ββββββββββ| 3/3 [00:10<00:00, 3.38s/it] |
|
Proceeding 38-length images samples | Num: 10 |
|
Proceeding 38-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:06<00:06, 6.28s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.53s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.83s/it] |
|
Proceeding 48-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.83s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.62s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.69s/it] |
|
Proceeding 50-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.28s/it]
67%|βββββββ | 2/3 [00:06<00:03, 3.26s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.64s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.84s/it] |
|
Proceeding 40-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.43s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.03s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.28s/it] |
|
Proceeding 50-length images samples | Num: 10 |
|
Proceeding 40-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.51s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.31s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.24s/it] |
|
Proceeding 52-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:04, 2.08s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.78s/it]
100%|ββββββββββ| 3/3 [00:10<00:00, 3.65s/it]
100%|ββββββββββ| 3/3 [00:10<00:00, 3.38s/it] |
|
Proceeding 42-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.19s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.84s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.79s/it] |
|
Proceeding 52-length images samples | Num: 10 |
|
Proceeding 42-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:05<00:05, 5.53s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.41s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.63s/it] |
|
Proceeding 54-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.09s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.30s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.46s/it] |
|
Proceeding 54-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:07, 3.67s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.62s/it]
100%|ββββββββββ| 3/3 [00:07<00:00, 2.34s/it]
100%|ββββββββββ| 3/3 [00:07<00:00, 2.55s/it] |
|
Proceeding 44-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.45s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.22s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.30s/it] |
|
Proceeding 56-length images samples | Num: 10 |
|
Proceeding 44-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.64s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.03s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.17s/it] |
|
Proceeding 56-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:04<00:08, 4.19s/it]
67%|βββββββ | 2/3 [00:07<00:03, 3.71s/it]
100%|ββββββββββ| 3/3 [00:15<00:00, 5.53s/it]
100%|ββββββββββ| 3/3 [00:15<00:00, 5.12s/it] |
|
Proceeding 46-length images samples | Num: 10 |
|
Proceeding 46-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.74s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.11s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.11s/it] |
|
Proceeding 58-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.40s/it]
100%|ββββββββββ| 2/2 [00:13<00:00, 6.88s/it]
100%|ββββββββββ| 2/2 [00:13<00:00, 6.56s/it] |
|
Proceeding 58-length images samples | Num: 10 |
|
Proceeding 48-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.89s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.13s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.29s/it] |
|
Proceeding 60-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:05<00:10, 5.31s/it]
67%|βββββββ | 2/3 [00:07<00:03, 3.31s/it]
100%|ββββββββββ| 3/3 [00:10<00:00, 3.09s/it]
100%|ββββββββββ| 3/3 [00:10<00:00, 3.37s/it] |
|
Proceeding 48-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:05<00:05, 5.59s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.39s/it]
100%|ββββββββββ| 2/2 [00:09<00:00, 4.63s/it] |
|
Proceeding 60-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:05<00:05, 5.98s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.11s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.44s/it] |
|
Proceeding 62-length images samples | Num: 10 |
|
Proceeding 50-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:05<00:05, 5.85s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.17s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.47s/it] |
|
Proceeding 62-length images samples | Num: 10 |
|
Proceeding 50-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:05<00:10, 5.37s/it]
67%|βββββββ | 2/3 [00:07<00:03, 3.56s/it]
100%|ββββββββββ| 3/3 [00:09<00:00, 2.83s/it]
100%|ββββββββββ| 3/3 [00:09<00:00, 3.24s/it] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:06<00:06, 6.32s/it]
100%|ββββββββββ| 2/2 [00:11<00:00, 5.72s/it]
100%|ββββββββββ| 2/2 [00:11<00:00, 5.86s/it] |
|
Proceeding 64-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:05<00:05, 6.00s/it]
100%|ββββββββββ| 2/2 [00:11<00:00, 5.60s/it]
100%|ββββββββββ| 2/2 [00:11<00:00, 5.70s/it] |
|
Proceeding 64-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:05<00:10, 5.22s/it]
67%|βββββββ | 2/3 [00:10<00:05, 5.12s/it]
100%|ββββββββββ| 3/3 [00:14<00:00, 4.60s/it]
100%|ββββββββββ| 3/3 [00:14<00:00, 4.78s/it] |
|
Proceeding 52-length images samples | Num: 10 |
|
Proceeding 52-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.93s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.80s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.02s/it] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:05<00:05, 5.07s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.06s/it]
100%|ββββββββββ| 2/2 [00:08<00:00, 4.26s/it] |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:05, 2.81s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.38s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 2.23s/it]
100%|ββββββββββ| 3/3 [00:07<00:00, 2.35s/it] |
|
Proceeding 54-length images samples | Num: 10 |
|
Proceeding 54-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.18s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.84s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.59s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.72s/it] |
|
Proceeding 56-length images samples | Num: 10 |
|
Proceeding 56-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:04, 2.45s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.37s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.82s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.73s/it] |
|
Proceeding 58-length images samples | Num: 10 |
|
Proceeding 58-length images samples | Num: 10 |
|
Proceeding 60-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:07, 3.54s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.80s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.55s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.72s/it] |
|
Proceeding 60-length images samples | Num: 10 |
|
Proceeding 62-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.01s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.58s/it]
100%|ββββββββββ| 3/3 [00:07<00:00, 2.49s/it]
100%|ββββββββββ| 3/3 [00:07<00:00, 2.59s/it] |
|
Proceeding 62-length images samples | Num: 10 |
|
Proceeding 64-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.30s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.88s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.70s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.82s/it] |
|
Proceeding 64-length images samples | Num: 10 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.16s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.95s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.77s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.88s/it] |
|
evaluating ImageNeedleInAHaystack ... |
|
Results saved to work_dirs/share_internvl/InternVL2-2B/eval_milebench/ImageNeedleInAHaystack/ImageNeedleInAHaystack_240803234858.json |
|
python eval/milebench/evaluate.py --data-dir /mnt/inspurfs/share_data/wangweiyun/share_data/long-context-benchmark/MileBench/datasets--FreedomIntelligence--MileBench/snapshots/53c7a58051ef88bacf76541d91f03f5ba2d71e7d --dataset ImageNeedleInAHaystack --result-dir work_dirs/share_internvl/InternVL2-2B/eval_milebench/ImageNeedleInAHaystack |
|
internvl: ImageNeedleInAHaystack: {'Accuracy': 0.75625, 'image_quantity_level-Accuracy': {'Few': 0.85, 'Medium': 0.8, 'Many': 0.711764705882353}, 'image_quantity_level-Result': {'Few': [17, 20], 'Medium': [104, 130], 'Many': [121, 170]}} |
|
|