|
language_model.model.layers.0 4 |
|
language_model.model.layers.1 4 |
|
language_model.model.layers.2 4 |
|
language_model.model.layers.3 4 |
|
language_model.model.layers.4 4 |
|
language_model.model.layers.5 4 |
|
language_model.model.layers.6 4 |
|
language_model.model.layers.7 4 |
|
language_model.model.layers.8 4 |
|
language_model.model.layers.9 4 |
|
language_model.model.layers.10 4 |
|
language_model.model.layers.11 4 |
|
language_model.model.layers.12 4 |
|
language_model.model.layers.13 4 |
|
language_model.model.layers.14 4 |
|
language_model.model.layers.15 4 |
|
language_model.model.layers.16 4 |
|
language_model.model.layers.17 4 |
|
language_model.model.layers.18 4 |
|
language_model.model.layers.19 4 |
|
language_model.model.layers.20 4 |
|
language_model.model.layers.21 4 |
|
language_model.model.layers.22 4 |
|
language_model.model.layers.23 4 |
|
vision_model.encoder.layers.0 0 |
|
vision_model.encoder.layers.1 0 |
|
vision_model.encoder.layers.2 0 |
|
vision_model.encoder.layers.3 0 |
|
vision_model.encoder.layers.4 0 |
|
vision_model.encoder.layers.5 0 |
|
vision_model.encoder.layers.6 0 |
|
vision_model.encoder.layers.7 0 |
|
vision_model.encoder.layers.8 0 |
|
vision_model.encoder.layers.9 0 |
|
vision_model.encoder.layers.10 0 |
|
vision_model.encoder.layers.11 0 |
|
vision_model.encoder.layers.12 0 |
|
vision_model.encoder.layers.13 0 |
|
vision_model.encoder.layers.14 0 |
|
vision_model.encoder.layers.15 0 |
|
vision_model.encoder.layers.16 0 |
|
vision_model.encoder.layers.17 0 |
|
vision_model.encoder.layers.18 0 |
|
vision_model.encoder.layers.19 0 |
|
vision_model.encoder.layers.20 0 |
|
vision_model.encoder.layers.21 0 |
|
vision_model.encoder.layers.22 0 |
|
vision_model.encoder.layers.23 0 |
|
vision_model.embeddings 0 |
|
mlp1 0 |
|
language_model.model.tok_embeddings 4 |
|
language_model.model.norm 4 |
|
language_model.output 4 |
|
language_model.model.embed_tokens 4 |
|
language_model.lm_head 4 |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [2] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionSequence, devices: {device(type='cuda', index=2), device(type='cuda', index=6)} |
|
Rank [0] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionSequence, devices: {device(type='cuda', index=0), device(type='cuda', index=4)} |
|
Rank [3] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionSequence, devices: {device(type='cuda', index=3), device(type='cuda', index=7)} |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [1] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionSequence, devices: {device(type='cuda', index=1), device(type='cuda', index=5)} |
|
Initialization Finished |
|
Predicting ActionSequence Using internvl |
|
Proceeding 44-length images samples | Num: 5 |
|
Initialization Finished |
|
Predicting ActionSequence Using internvl |
|
Proceeding 44-length images samples | Num: 5 |
|
Initialization Finished |
|
Predicting ActionSequence Using internvl |
|
Proceeding 44-length images samples | Num: 5 |
|
Initialization Finished |
|
Predicting ActionSequence Using internvl |
|
Proceeding 44-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:07<00:00, 7.73s/it]
100%|ββββββββββ| 1/1 [00:07<00:00, 7.87s/it] |
|
Proceeding 31-length images samples | Num: 24 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:07<00:00, 7.74s/it]
100%|ββββββββββ| 1/1 [00:07<00:00, 7.87s/it] |
|
Proceeding 31-length images samples | Num: 24 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:07<00:00, 7.74s/it]
100%|ββββββββββ| 1/1 [00:07<00:00, 7.87s/it] |
|
Proceeding 31-length images samples | Num: 24 |
|
Proceeding 31-length images samples | Num: 24 |
|
0%| | 0/6 [00:00<?, ?it/s]
17%|ββ | 1/6 [00:03<00:18, 3.68s/it]
33%|ββββ | 2/6 [00:04<00:07, 2.00s/it]
50%|βββββ | 3/6 [00:06<00:06, 2.15s/it]
67%|βββββββ | 4/6 [00:08<00:03, 1.79s/it]
83%|βββββββββ | 5/6 [00:09<00:01, 1.58s/it]
100%|ββββββββββ| 6/6 [00:11<00:00, 1.66s/it]
100%|ββββββββββ| 6/6 [00:11<00:00, 1.86s/it] |
|
Proceeding 32-length images samples | Num: 13 |
|
0%| | 0/6 [00:00<?, ?it/s]
17%|ββ | 1/6 [00:03<00:19, 3.97s/it]
33%|ββββ | 2/6 [00:06<00:12, 3.08s/it]
50%|βββββ | 3/6 [00:07<00:06, 2.06s/it]
67%|βββββββ | 4/6 [00:08<00:03, 1.67s/it]
83%|βββββββββ | 5/6 [00:10<00:01, 1.95s/it]
100%|ββββββββββ| 6/6 [00:11<00:00, 1.57s/it]
100%|ββββββββββ| 6/6 [00:11<00:00, 1.95s/it] |
|
Proceeding 32-length images samples | Num: 13 |
|
0%| | 0/6 [00:00<?, ?it/s]
17%|ββ | 1/6 [00:04<00:20, 4.02s/it]
33%|ββββ | 2/6 [00:06<00:12, 3.12s/it]
50%|βββββ | 3/6 [00:07<00:06, 2.22s/it]
67%|βββββββ | 4/6 [00:08<00:03, 1.67s/it]
83%|βββββββββ | 5/6 [00:10<00:01, 1.93s/it]
100%|ββββββββββ| 6/6 [00:11<00:00, 1.58s/it]
100%|ββββββββββ| 6/6 [00:11<00:00, 1.98s/it] |
|
Proceeding 32-length images samples | Num: 13 |
|
Proceeding 32-length images samples | Num: 13 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.19s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.73s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 3.24s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.97s/it] |
|
Proceeding 16-length images samples | Num: 4 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:05, 2.72s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.21s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.92s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.80s/it] |
|
Proceeding 16-length images samples | Num: 4 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:04<00:08, 4.09s/it]
67%|βββββββ | 2/3 [00:06<00:03, 3.01s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.54s/it]
100%|ββββββββββ| 3/3 [00:08<00:00, 2.81s/it] |
|
Proceeding 16-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.18s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.27s/it] |
|
Proceeding 45-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.21s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.30s/it] |
|
Proceeding 45-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.15s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.25s/it] |
|
Proceeding 45-length images samples | Num: 4 |
|
Proceeding 16-length images samples | Num: 4 |
|
Proceeding 45-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.59s/it]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.67s/it] |
|
Proceeding 39-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.45s/it]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.55s/it] |
|
Proceeding 39-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.64s/it]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.73s/it] |
|
Proceeding 39-length images samples | Num: 3 |
|
Proceeding 39-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 69-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 14 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.21s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.30s/it] |
|
Proceeding 69-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.11s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.22s/it] |
|
Proceeding 69-length images samples | Num: 1 |
|
Proceeding 69-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 14 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 14 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.44s/it]
67%|βββββββ | 2/3 [00:04<00:01, 1.81s/it]
100%|ββββββββββ| 3/3 [00:07<00:00, 2.66s/it]
100%|ββββββββββ| 3/3 [00:07<00:00, 2.62s/it] |
|
Proceeding 38-length images samples | Num: 4 |
|
Proceeding 27-length images samples | Num: 14 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.40s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.49s/it] |
|
Proceeding 21-length images samples | Num: 5 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:04<00:08, 4.03s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.13s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.49s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.89s/it] |
|
Proceeding 38-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.19s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.28s/it] |
|
Proceeding 8-length images samples | Num: 1 |
|
0%| | 0/4 [00:00<?, ?it/s]
25%|βββ | 1/4 [00:04<00:12, 4.02s/it]
50%|βββββ | 2/4 [00:04<00:04, 2.22s/it]
75%|ββββββββ | 3/4 [00:05<00:01, 1.54s/it]
100%|ββββββββββ| 4/4 [00:08<00:00, 2.14s/it]
100%|ββββββββββ| 4/4 [00:08<00:00, 2.22s/it] |
|
Proceeding 38-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 13-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.52s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.63s/it] |
|
Proceeding 21-length images samples | Num: 5 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 30-length images samples | Num: 37 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.89s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.98s/it] |
|
Proceeding 21-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.83s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.92s/it] |
|
Proceeding 8-length images samples | Num: 1 |
|
Proceeding 38-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 13-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.12it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.03it/s] |
|
Proceeding 8-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 30-length images samples | Num: 37 |
|
Proceeding 13-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 21-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.16s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.24s/it] |
|
Proceeding 30-length images samples | Num: 37 |
|
Proceeding 8-length images samples | Num: 1 |
|
Proceeding 13-length images samples | Num: 2 |
|
Proceeding 30-length images samples | Num: 37 |
|
0%| | 0/9 [00:00<?, ?it/s]
11%|β | 1/9 [00:02<00:22, 2.81s/it]
22%|βββ | 2/9 [00:03<00:11, 1.63s/it]
33%|ββββ | 3/9 [00:04<00:08, 1.46s/it]
44%|βββββ | 4/9 [00:05<00:05, 1.19s/it]
56%|ββββββ | 5/9 [00:07<00:06, 1.57s/it]
67%|βββββββ | 6/9 [00:10<00:06, 2.02s/it]
78%|ββββββββ | 7/9 [00:13<00:04, 2.29s/it]
89%|βββββββββ | 8/9 [00:14<00:01, 1.80s/it]
100%|ββββββββββ| 9/9 [00:16<00:00, 1.88s/it]
100%|ββββββββββ| 9/9 [00:16<00:00, 1.84s/it] |
|
Proceeding 36-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 4 |
|
Proceeding 34-length images samples | Num: 8 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.85s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.93s/it] |
|
0%| | 0/9 [00:00<?, ?it/s]
11%|β | 1/9 [00:01<00:13, 1.67s/it]
22%|βββ | 2/9 [00:04<00:14, 2.08s/it]
33%|ββββ | 3/9 [00:04<00:08, 1.48s/it]
44%|βββββ | 4/9 [00:07<00:09, 1.91s/it]
56%|ββββββ | 5/9 [00:10<00:09, 2.29s/it]
67%|βββββββ | 6/9 [00:12<00:06, 2.09s/it]
78%|ββββββββ | 7/9 [00:13<00:03, 1.76s/it]
89%|βββββββββ | 8/9 [00:13<00:01, 1.46s/it]
100%|ββββββββββ| 9/9 [00:15<00:00, 1.61s/it]
100%|ββββββββββ| 9/9 [00:15<00:00, 1.78s/it] |
|
Proceeding 36-length images samples | Num: 2 |
|
Proceeding 24-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.28s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.38s/it] |
|
Proceeding 34-length images samples | Num: 8 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.46s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.21s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.14s/it] |
|
Proceeding 11-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 15-length images samples | Num: 2 |
|
0%| | 0/9 [00:00<?, ?it/s]
11%|β | 1/9 [00:02<00:23, 2.89s/it]
22%|βββ | 2/9 [00:05<00:19, 2.84s/it]
33%|ββββ | 3/9 [00:07<00:13, 2.28s/it]
44%|βββββ | 4/9 [00:09<00:10, 2.09s/it]
56%|ββββββ | 5/9 [00:11<00:08, 2.17s/it]
67%|βββββββ | 6/9 [00:12<00:05, 1.72s/it]
78%|ββββββββ | 7/9 [00:14<00:03, 1.81s/it]
89%|βββββββββ | 8/9 [00:15<00:01, 1.61s/it]
100%|ββββββββββ| 9/9 [00:18<00:00, 2.03s/it]
100%|ββββββββββ| 9/9 [00:18<00:00, 2.06s/it] |
|
Proceeding 36-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 28-length images samples | Num: 7 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.41s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.10s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.20s/it] |
|
Proceeding 11-length images samples | Num: 2 |
|
Proceeding 15-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 28-length images samples | Num: 7 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.07s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.16s/it] |
|
Proceeding 29-length images samples | Num: 7 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.56s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.64s/it] |
|
Proceeding 24-length images samples | Num: 4 |
|
Proceeding 36-length images samples | Num: 2 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.74s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.15s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.28s/it] |
|
Proceeding 29-length images samples | Num: 7 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.34s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.42s/it] |
|
Proceeding 34-length images samples | Num: 8 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.86s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.94s/it] |
|
Proceeding 25-length images samples | Num: 3 |
|
Proceeding 19-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.27it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.09it/s] |
|
Proceeding 51-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.29s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.04s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.14s/it] |
|
Proceeding 25-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 35-length images samples | Num: 7 |
|
Proceeding 34-length images samples | Num: 8 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.57s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.16s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.27s/it] |
|
Proceeding 11-length images samples | Num: 2 |
|
Proceeding 19-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.04it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.05s/it] |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.59s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.68s/it] |
|
Proceeding 15-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.74s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.83s/it] |
|
Proceeding 51-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.49s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.59s/it] |
|
Proceeding 40-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 35-length images samples | Num: 7 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 33-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.07it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.02s/it] |
|
Proceeding 28-length images samples | Num: 7 |
|
Proceeding 11-length images samples | Num: 2 |
|
Proceeding 15-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.10s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.18s/it] |
|
Proceeding 60-length images samples | Num: 1 |
|
Proceeding 28-length images samples | Num: 7 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 49-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 23-length images samples | Num: 7 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.21s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 1.86s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.12s/it] |
|
Proceeding 40-length images samples | Num: 2 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.92s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.74s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.96s/it] |
|
Proceeding 29-length images samples | Num: 7 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 33-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.75s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.87s/it] |
|
Proceeding 70-length images samples | Num: 1 |
|
Proceeding 37-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 22-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 20-length images samples | Num: 2 |
|
Proceeding 29-length images samples | Num: 7 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.65s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.73s/it] |
|
Proceeding 60-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 17-length images samples | Num: 5 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 49-length images samples | Num: 2 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.74s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.58s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.80s/it] |
|
Proceeding 25-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 23-length images samples | Num: 7 |
|
Proceeding 18-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.14s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.24s/it] |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.78s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.87s/it] |
|
Proceeding 19-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.93s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.01s/it] |
|
Proceeding 50-length images samples | Num: 1 |
|
Proceeding 25-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.17s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.25s/it] |
|
Proceeding 51-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 26-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.32s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.33s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.53s/it] |
|
Proceeding 70-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 35-length images samples | Num: 7 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 37-length images samples | Num: 3 |
|
Proceeding 19-length images samples | Num: 4 |
|
Proceeding 51-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.33s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.41s/it] |
|
Proceeding 22-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 20-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 17-length images samples | Num: 5 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.38s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.11s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.19s/it] |
|
Proceeding 40-length images samples | Num: 2 |
|
Proceeding 35-length images samples | Num: 7 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.02s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.12s/it] |
|
Proceeding 18-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.68s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.77s/it] |
|
Proceeding 33-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.02it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.07s/it] |
|
Proceeding 50-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 26-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.49s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.59s/it] |
|
Proceeding 60-length images samples | Num: 1 |
|
Proceeding 40-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 49-length images samples | Num: 2 |
|
Proceeding 33-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.77s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.84s/it] |
|
Proceeding 23-length images samples | Num: 7 |
|
Proceeding 60-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.03s/it]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.27it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.15it/s] |
|
Proceeding 70-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 37-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.29s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.39s/it] |
|
Proceeding 22-length images samples | Num: 2 |
|
Proceeding 49-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.24s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.34s/it] |
|
Proceeding 20-length images samples | Num: 2 |
|
Proceeding 23-length images samples | Num: 7 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.19it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.08it/s] |
|
Proceeding 17-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.10s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.20s/it] |
|
Proceeding 18-length images samples | Num: 4 |
|
Proceeding 70-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.15it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.05it/s] |
|
Proceeding 50-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 26-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 37-length images samples | Num: 3 |
|
Proceeding 22-length images samples | Num: 2 |
|
Proceeding 20-length images samples | Num: 2 |
|
Proceeding 17-length images samples | Num: 5 |
|
Proceeding 18-length images samples | Num: 4 |
|
Proceeding 50-length images samples | Num: 1 |
|
Proceeding 26-length images samples | Num: 1 |
|
evaluating ActionSequence ... |
|
Results saved to work_dirs/share_internvl/InternVL2-2B/eval_milebench/ActionSequence/ActionSequence_240803234618.json |
|
python eval/milebench/evaluate.py --data-dir /mnt/inspurfs/share_data/wangweiyun/share_data/long-context-benchmark/MileBench/datasets--FreedomIntelligence--MileBench/snapshots/53c7a58051ef88bacf76541d91f03f5ba2d71e7d --dataset ActionSequence --result-dir work_dirs/share_internvl/InternVL2-2B/eval_milebench/ActionSequence |
|
internvl: ActionSequence: {'Accuracy': 0.715, 'image_quantity_level-Accuracy': {'Few': 0, 'Medium': 0.7153284671532847, 'Many': 0.7142857142857143}, 'image_quantity_level-Result': {'Few': [0, 0], 'Medium': [98, 137], 'Many': [45, 63]}} |
|
|