|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
language_model.model.layers.0 4 |
|
language_model.model.layers.1 4 |
|
language_model.model.layers.2 4 |
|
language_model.model.layers.3 4 |
|
language_model.model.layers.4 4 |
|
language_model.model.layers.5 4 |
|
language_model.model.layers.6 4 |
|
language_model.model.layers.7 4 |
|
language_model.model.layers.8 4 |
|
language_model.model.layers.9 4 |
|
language_model.model.layers.10 4 |
|
language_model.model.layers.11 4 |
|
language_model.model.layers.12 4 |
|
language_model.model.layers.13 4 |
|
language_model.model.layers.14 4 |
|
language_model.model.layers.15 4 |
|
language_model.model.layers.16 4 |
|
language_model.model.layers.17 4 |
|
language_model.model.layers.18 4 |
|
language_model.model.layers.19 4 |
|
language_model.model.layers.20 4 |
|
language_model.model.layers.21 4 |
|
language_model.model.layers.22 4 |
|
language_model.model.layers.23 4 |
|
vision_model.encoder.layers.0 0 |
|
vision_model.encoder.layers.1 0 |
|
vision_model.encoder.layers.2 0 |
|
vision_model.encoder.layers.3 0 |
|
vision_model.encoder.layers.4 0 |
|
vision_model.encoder.layers.5 0 |
|
vision_model.encoder.layers.6 0 |
|
vision_model.encoder.layers.7 0 |
|
vision_model.encoder.layers.8 0 |
|
vision_model.encoder.layers.9 0 |
|
vision_model.encoder.layers.10 0 |
|
vision_model.encoder.layers.11 0 |
|
vision_model.encoder.layers.12 0 |
|
vision_model.encoder.layers.13 0 |
|
vision_model.encoder.layers.14 0 |
|
vision_model.encoder.layers.15 0 |
|
vision_model.encoder.layers.16 0 |
|
vision_model.encoder.layers.17 0 |
|
vision_model.encoder.layers.18 0 |
|
vision_model.encoder.layers.19 0 |
|
vision_model.encoder.layers.20 0 |
|
vision_model.encoder.layers.21 0 |
|
vision_model.encoder.layers.22 0 |
|
vision_model.encoder.layers.23 0 |
|
vision_model.embeddings 0 |
|
mlp1 0 |
|
language_model.model.tok_embeddings 4 |
|
language_model.model.norm 4 |
|
language_model.output 4 |
|
language_model.model.embed_tokens 4 |
|
language_model.lm_head 4 |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [1] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionLocalization, devices: {device(type='cuda', index=1), device(type='cuda', index=5)} |
|
Initialization Finished |
|
Predicting ActionLocalization Using internvl |
|
Proceeding 30-length images samples | Num: 40 |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [0] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionLocalization, devices: {device(type='cuda', index=0), device(type='cuda', index=4)} |
|
Initialization Finished |
|
Predicting ActionLocalization Using internvl |
|
Proceeding 30-length images samples | Num: 40 |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [2] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionLocalization, devices: {device(type='cuda', index=2), device(type='cuda', index=6)} |
|
Initialization Finished |
|
Predicting ActionLocalization Using internvl |
|
Proceeding 30-length images samples | Num: 40 |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [3] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionLocalization, devices: {device(type='cuda', index=3), device(type='cuda', index=7)} |
|
Initialization Finished |
|
Predicting ActionLocalization Using internvl |
|
Proceeding 30-length images samples | Num: 40 |
|
0%| | 0/10 [00:00<?, ?it/s]
10%|β | 1/10 [00:04<00:43, 4.83s/it]
20%|ββ | 2/10 [00:06<00:25, 3.24s/it]
30%|βββ | 3/10 [00:07<00:15, 2.19s/it]
40%|ββββ | 4/10 [00:10<00:13, 2.27s/it]
50%|βββββ | 5/10 [00:13<00:13, 2.73s/it]
60%|ββββββ | 6/10 [00:17<00:11, 2.99s/it]
70%|βββββββ | 7/10 [00:19<00:08, 2.73s/it]
80%|ββββββββ | 8/10 [00:21<00:04, 2.39s/it]
90%|βββββββββ | 9/10 [00:22<00:01, 1.99s/it]
100%|ββββββββββ| 10/10 [00:24<00:00, 2.08s/it]
100%|ββββββββββ| 10/10 [00:24<00:00, 2.46s/it] |
|
Proceeding 31-length images samples | Num: 24 |
|
Proceeding 31-length images samples | Num: 24 |
|
0%| | 0/10 [00:00<?, ?it/s]
10%|β | 1/10 [00:04<00:43, 4.85s/it]
20%|ββ | 2/10 [00:07<00:28, 3.56s/it]
30%|βββ | 3/10 [00:09<00:21, 3.06s/it]
40%|ββββ | 4/10 [00:13<00:18, 3.10s/it]
50%|βββββ | 5/10 [00:14<00:11, 2.35s/it]
60%|ββββββ | 6/10 [00:17<00:10, 2.59s/it]
70%|βββββββ | 7/10 [00:19<00:07, 2.58s/it]
80%|ββββββββ | 8/10 [00:21<00:04, 2.31s/it]
90%|βββββββββ | 9/10 [00:22<00:01, 1.98s/it]
100%|ββββββββββ| 10/10 [00:24<00:00, 1.92s/it]
100%|ββββββββββ| 10/10 [00:24<00:00, 2.46s/it] |
|
Proceeding 31-length images samples | Num: 24 |
|
0%| | 0/10 [00:00<?, ?it/s]
10%|β | 1/10 [00:04<00:44, 4.94s/it]
20%|ββ | 2/10 [00:07<00:28, 3.60s/it]
30%|βββ | 3/10 [00:10<00:21, 3.12s/it]
40%|ββββ | 4/10 [00:13<00:19, 3.30s/it]
50%|βββββ | 5/10 [00:16<00:16, 3.25s/it]
60%|ββββββ | 6/10 [00:17<00:09, 2.43s/it]
70%|βββββββ | 7/10 [00:19<00:07, 2.35s/it]
80%|ββββββββ | 8/10 [00:21<00:04, 2.15s/it]
90%|βββββββββ | 9/10 [00:23<00:01, 1.98s/it]
100%|ββββββββββ| 10/10 [00:24<00:00, 1.84s/it]
100%|ββββββββββ| 10/10 [00:24<00:00, 2.48s/it] |
|
Proceeding 31-length images samples | Num: 24 |
|
0%| | 0/6 [00:00<?, ?it/s]
17%|ββ | 1/6 [00:03<00:16, 3.27s/it]
33%|ββββ | 2/6 [00:06<00:14, 3.51s/it]
50%|βββββ | 3/6 [00:08<00:08, 2.81s/it]
67%|βββββββ | 4/6 [00:09<00:04, 2.00s/it]
83%|βββββββββ | 5/6 [00:10<00:01, 1.73s/it]
100%|ββββββββββ| 6/6 [00:13<00:00, 2.09s/it]
100%|ββββββββββ| 6/6 [00:13<00:00, 2.30s/it] |
|
Proceeding 42-length images samples | Num: 4 |
|
Proceeding 42-length images samples | Num: 4 |
|
0%| | 0/6 [00:00<?, ?it/s]
17%|ββ | 1/6 [00:03<00:18, 3.65s/it]
33%|ββββ | 2/6 [00:07<00:14, 3.50s/it]
50%|βββββ | 3/6 [00:09<00:08, 2.86s/it]
67%|βββββββ | 4/6 [00:10<00:04, 2.26s/it]
83%|βββββββββ | 5/6 [00:13<00:02, 2.36s/it]
100%|ββββββββββ| 6/6 [00:14<00:00, 1.96s/it]
100%|ββββββββββ| 6/6 [00:14<00:00, 2.38s/it] |
|
Proceeding 42-length images samples | Num: 4 |
|
0%| | 0/6 [00:00<?, ?it/s]
17%|ββ | 1/6 [00:03<00:18, 3.74s/it]
33%|ββββ | 2/6 [00:07<00:14, 3.53s/it]
50%|βββββ | 3/6 [00:09<00:08, 2.86s/it]
67%|βββββββ | 4/6 [00:10<00:04, 2.27s/it]
83%|βββββββββ | 5/6 [00:13<00:02, 2.52s/it]
100%|ββββββββββ| 6/6 [00:14<00:00, 1.95s/it]
100%|ββββββββββ| 6/6 [00:14<00:00, 2.41s/it] |
|
Proceeding 42-length images samples | Num: 4 |
|
Proceeding 33-length images samples | Num: 12 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.14s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.22s/it] |
|
Proceeding 33-length images samples | Num: 12 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.94s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.02s/it] |
|
Proceeding 33-length images samples | Num: 12 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.05s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.13s/it] |
|
Proceeding 33-length images samples | Num: 12 |
|
Proceeding 41-length images samples | Num: 5 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:04, 2.29s/it]
67%|βββββββ | 2/3 [00:03<00:01, 1.72s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.42s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.58s/it] |
|
Proceeding 41-length images samples | Num: 5 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:04, 2.50s/it]
67%|βββββββ | 2/3 [00:03<00:01, 1.76s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.39s/it]
100%|ββββββββββ| 3/3 [00:04<00:00, 1.59s/it] |
|
Proceeding 41-length images samples | Num: 5 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:05, 2.76s/it]
67%|βββββββ | 2/3 [00:04<00:01, 1.94s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.58s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.79s/it] |
|
Proceeding 41-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.50s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.58s/it] |
|
Proceeding 37-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.54s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.63s/it] |
|
Proceeding 37-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.95s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.04s/it] |
|
Proceeding 37-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.16s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.24s/it] |
|
Proceeding 35-length images samples | Num: 10 |
|
Proceeding 37-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.74s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.83s/it] |
|
Proceeding 35-length images samples | Num: 10 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.59s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.69s/it] |
|
Proceeding 35-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.88s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.32s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.60s/it] |
|
Proceeding 58-length images samples | Num: 2 |
|
Proceeding 35-length images samples | Num: 10 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 39-length images samples | Num: 5 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.64s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.08s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.36s/it] |
|
Proceeding 58-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 39-length images samples | Num: 5 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:07, 3.82s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.26s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 1.83s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 2.13s/it] |
|
Proceeding 58-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.84s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.94s/it] |
|
Proceeding 40-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 43-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.86s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.98s/it] |
|
Proceeding 40-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 36-length images samples | Num: 9 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.21s/it]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.28s/it] |
|
Proceeding 39-length images samples | Num: 5 |
|
Proceeding 58-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.82s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.90s/it] |
|
Proceeding 40-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.76s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.83s/it] |
|
Proceeding 43-length images samples | Num: 3 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.22s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 1.83s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.08s/it] |
|
Proceeding 48-length images samples | Num: 1 |
|
Proceeding 39-length images samples | Num: 5 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 44-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.55s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.62s/it] |
|
Proceeding 43-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 46-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 38-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.70s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.78s/it] |
|
Proceeding 36-length images samples | Num: 9 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 34-length images samples | Num: 9 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.58s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.68s/it] |
|
Proceeding 36-length images samples | Num: 9 |
|
Proceeding 40-length images samples | Num: 3 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.92s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 1.84s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.05s/it] |
|
Proceeding 48-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.95s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.74s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.96s/it] |
|
Proceeding 50-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 44-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 32-length images samples | Num: 17 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.50s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.62s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.79s/it] |
|
Proceeding 48-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 46-length images samples | Num: 1 |
|
Proceeding 43-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 44-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 38-length images samples | Num: 3 |
|
Proceeding 36-length images samples | Num: 9 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.76s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.85s/it] |
|
Proceeding 34-length images samples | Num: 9 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.06s/it]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.14s/it] |
|
Proceeding 46-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 38-length images samples | Num: 3 |
|
0%| | 0/4 [00:00<?, ?it/s]
25%|βββ | 1/4 [00:01<00:03, 1.21s/it]
50%|βββββ | 2/4 [00:05<00:05, 2.91s/it]
75%|ββββββββ | 3/4 [00:06<00:01, 1.96s/it]
100%|ββββββββββ| 4/4 [00:09<00:00, 2.35s/it]
100%|ββββββββββ| 4/4 [00:09<00:00, 2.29s/it] |
|
Proceeding 26-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.28s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.37s/it] |
|
Proceeding 34-length images samples | Num: 9 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 4 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.63s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.68s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.72s/it] |
|
Proceeding 50-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.40s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.49s/it] |
|
Proceeding 20-length images samples | Num: 2 |
|
Proceeding 48-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 32-length images samples | Num: 17 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 45-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 29-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.64s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.37s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.46s/it] |
|
Proceeding 50-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 32-length images samples | Num: 17 |
|
Proceeding 44-length images samples | Num: 2 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.84s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.79s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.99s/it] |
|
Proceeding 28-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.53s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.61s/it] |
|
Proceeding 23-length images samples | Num: 8 |
|
Proceeding 46-length images samples | Num: 1 |
|
0%| | 0/4 [00:00<?, ?it/s]
25%|βββ | 1/4 [00:03<00:09, 3.22s/it]
50%|βββββ | 2/4 [00:04<00:03, 1.82s/it]
75%|ββββββββ | 3/4 [00:06<00:02, 2.08s/it]
100%|ββββββββββ| 4/4 [00:08<00:00, 2.03s/it]
100%|ββββββββββ| 4/4 [00:08<00:00, 2.12s/it] |
|
Proceeding 26-length images samples | Num: 3 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.69s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.78s/it] |
|
Proceeding 27-length images samples | Num: 4 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:00<00:00, 1.04it/s]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.94s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.83s/it] |
|
Proceeding 22-length images samples | Num: 1 |
|
0%| | 0/4 [00:00<?, ?it/s]
25%|βββ | 1/4 [00:01<00:04, 1.59s/it]
50%|βββββ | 2/4 [00:04<00:05, 2.55s/it]
75%|ββββββββ | 3/4 [00:05<00:01, 1.78s/it]
100%|ββββββββββ| 4/4 [00:08<00:00, 2.22s/it]
100%|ββββββββββ| 4/4 [00:08<00:00, 2.17s/it] |
|
Proceeding 26-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 21-length images samples | Num: 5 |
|
Proceeding 38-length images samples | Num: 3 |
|
Proceeding 20-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.03s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.12s/it] |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 45-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 29-length images samples | Num: 10 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.30s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.37s/it] |
|
Proceeding 19-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.43s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.51s/it] |
|
Proceeding 27-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 17-length images samples | Num: 1 |
|
Proceeding 34-length images samples | Num: 9 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.08s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.18s/it] |
|
Proceeding 20-length images samples | Num: 2 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.80s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.45s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.54s/it] |
|
Proceeding 28-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.03s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.11s/it] |
|
Proceeding 45-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.34s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.45s/it] |
|
Proceeding 23-length images samples | Num: 8 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.62s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.70s/it] |
|
Proceeding 29-length images samples | Num: 10 |
|
Proceeding 50-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.12s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.05s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.10s/it] |
|
Proceeding 22-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 21-length images samples | Num: 5 |
|
Proceeding 32-length images samples | Num: 17 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.04s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.12s/it] |
|
Proceeding 19-length images samples | Num: 1 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.17s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.09s/it]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.02it/s]
100%|ββββββββββ| 3/3 [00:03<00:00, 1.05s/it] |
|
Proceeding 28-length images samples | Num: 6 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 17-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.09s/it]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.14it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.05it/s] |
|
Proceeding 23-length images samples | Num: 8 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.25s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.02s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.09s/it] |
|
Proceeding 22-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 21-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.14it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.04it/s] |
|
Proceeding 19-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 17-length images samples | Num: 1 |
|
Proceeding 26-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 4 |
|
Proceeding 20-length images samples | Num: 2 |
|
Proceeding 45-length images samples | Num: 2 |
|
Proceeding 29-length images samples | Num: 10 |
|
Proceeding 28-length images samples | Num: 6 |
|
Proceeding 23-length images samples | Num: 8 |
|
Proceeding 22-length images samples | Num: 1 |
|
Proceeding 21-length images samples | Num: 5 |
|
Proceeding 19-length images samples | Num: 1 |
|
Proceeding 17-length images samples | Num: 1 |
|
Proceeding 24-length images samples | Num: 1 |
|
evaluating ActionLocalization ... |
|
Results saved to work_dirs/share_internvl/InternVL2-2B/eval_milebench/ActionLocalization/ActionLocalization_240803234615.json |
|
python eval/milebench/evaluate.py --data-dir /mnt/inspurfs/share_data/wangweiyun/share_data/long-context-benchmark/MileBench/datasets--FreedomIntelligence--MileBench/snapshots/53c7a58051ef88bacf76541d91f03f5ba2d71e7d --dataset ActionLocalization --result-dir work_dirs/share_internvl/InternVL2-2B/eval_milebench/ActionLocalization |
|
internvl: ActionLocalization: {'Accuracy': 0.255, 'image_quantity_level-Accuracy': {'Few': 0, 'Medium': 0.330188679245283, 'Many': 0.1702127659574468}, 'image_quantity_level-Result': {'Few': [0, 0], 'Medium': [35, 106], 'Many': [16, 94]}} |
|
|