|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
language_model.model.layers.0 4 |
|
language_model.model.layers.1 4 |
|
language_model.model.layers.2 4 |
|
language_model.model.layers.3 4 |
|
language_model.model.layers.4 4 |
|
language_model.model.layers.5 4 |
|
language_model.model.layers.6 4 |
|
language_model.model.layers.7 4 |
|
language_model.model.layers.8 4 |
|
language_model.model.layers.9 4 |
|
language_model.model.layers.10 4 |
|
language_model.model.layers.11 4 |
|
language_model.model.layers.12 4 |
|
language_model.model.layers.13 4 |
|
language_model.model.layers.14 4 |
|
language_model.model.layers.15 4 |
|
language_model.model.layers.16 4 |
|
language_model.model.layers.17 4 |
|
language_model.model.layers.18 4 |
|
language_model.model.layers.19 4 |
|
language_model.model.layers.20 4 |
|
language_model.model.layers.21 4 |
|
language_model.model.layers.22 4 |
|
language_model.model.layers.23 4 |
|
vision_model.encoder.layers.0 0 |
|
vision_model.encoder.layers.1 0 |
|
vision_model.encoder.layers.2 0 |
|
vision_model.encoder.layers.3 0 |
|
vision_model.encoder.layers.4 0 |
|
vision_model.encoder.layers.5 0 |
|
vision_model.encoder.layers.6 0 |
|
vision_model.encoder.layers.7 0 |
|
vision_model.encoder.layers.8 0 |
|
vision_model.encoder.layers.9 0 |
|
vision_model.encoder.layers.10 0 |
|
vision_model.encoder.layers.11 0 |
|
vision_model.encoder.layers.12 0 |
|
vision_model.encoder.layers.13 0 |
|
vision_model.encoder.layers.14 0 |
|
vision_model.encoder.layers.15 0 |
|
vision_model.encoder.layers.16 0 |
|
vision_model.encoder.layers.17 0 |
|
vision_model.encoder.layers.18 0 |
|
vision_model.encoder.layers.19 0 |
|
vision_model.encoder.layers.20 0 |
|
vision_model.encoder.layers.21 0 |
|
vision_model.encoder.layers.22 0 |
|
vision_model.encoder.layers.23 0 |
|
vision_model.embeddings 0 |
|
mlp1 0 |
|
language_model.model.tok_embeddings 4 |
|
language_model.model.norm 4 |
|
language_model.output 4 |
|
language_model.model.embed_tokens 4 |
|
language_model.lm_head 4 |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [3] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ObjectInteraction, devices: {device(type='cuda', index=3), device(type='cuda', index=7)} |
|
Initialization Finished |
|
Predicting ObjectInteraction Using internvl |
|
Proceeding 19-length images samples | Num: 7 |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [0] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ObjectInteraction, devices: {device(type='cuda', index=0), device(type='cuda', index=4)} |
|
Initialization Finished |
|
Predicting ObjectInteraction Using internvl |
|
Proceeding 19-length images samples | Num: 7 |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
Rank [2] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ObjectInteraction, devices: {device(type='cuda', index=2), device(type='cuda', index=6)} |
|
Rank [1] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ObjectInteraction, devices: {device(type='cuda', index=1), device(type='cuda', index=5)} |
|
Initialization Finished |
|
Predicting ObjectInteraction Using internvl |
|
Proceeding 19-length images samples | Num: 7 |
|
Initialization Finished |
|
Predicting ObjectInteraction Using internvl |
|
Proceeding 19-length images samples | Num: 7 |
|
Proceeding 32-length images samples | Num: 16 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.88s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.93s/it] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.78s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.58s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.80s/it] |
|
Proceeding 32-length images samples | Num: 16 |
|
Proceeding 32-length images samples | Num: 16 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.87s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.65s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.86s/it] |
|
Proceeding 32-length images samples | Num: 16 |
|
0%| | 0/4 [00:00<?, ?it/s]
25%|βββ | 1/4 [00:04<00:12, 4.18s/it]
50%|βββββ | 2/4 [00:05<00:04, 2.21s/it]
75%|ββββββββ | 3/4 [00:09<00:03, 3.12s/it]
100%|ββββββββββ| 4/4 [00:12<00:00, 3.34s/it]
100%|ββββββββββ| 4/4 [00:12<00:00, 3.25s/it] |
|
Proceeding 30-length images samples | Num: 49 |
|
0%| | 0/4 [00:00<?, ?it/s]
25%|βββ | 1/4 [00:02<00:06, 2.15s/it]
50%|βββββ | 2/4 [00:06<00:06, 3.43s/it]
75%|ββββββββ | 3/4 [00:10<00:03, 3.89s/it]
100%|ββββββββββ| 4/4 [00:14<00:00, 3.83s/it]
100%|ββββββββββ| 4/4 [00:14<00:00, 3.69s/it] |
|
Proceeding 30-length images samples | Num: 49 |
|
0%| | 0/4 [00:00<?, ?it/s]
25%|βββ | 1/4 [00:04<00:13, 4.64s/it]
50%|βββββ | 2/4 [00:08<00:08, 4.46s/it]
75%|ββββββββ | 3/4 [00:12<00:03, 3.94s/it]
100%|ββββββββββ| 4/4 [00:13<00:00, 2.70s/it]
100%|ββββββββββ| 4/4 [00:13<00:00, 3.29s/it] |
|
Proceeding 30-length images samples | Num: 49 |
|
Proceeding 30-length images samples | Num: 49 |
|
0%| | 0/12 [00:00<?, ?it/s]
8%|β | 1/12 [00:02<00:26, 2.45s/it]
17%|ββ | 2/12 [00:04<00:20, 2.05s/it]
25%|βββ | 3/12 [00:06<00:17, 2.00s/it]
33%|ββββ | 4/12 [00:07<00:14, 1.82s/it]
42%|βββββ | 5/12 [00:10<00:16, 2.33s/it]
50%|βββββ | 6/12 [00:14<00:15, 2.66s/it]
58%|ββββββ | 7/12 [00:17<00:14, 2.94s/it]
67%|βββββββ | 8/12 [00:20<00:11, 2.89s/it]
75%|ββββββββ | 9/12 [00:24<00:09, 3.08s/it]
83%|βββββββββ | 10/12 [00:26<00:05, 2.86s/it]
92%|ββββββββββ| 11/12 [00:27<00:02, 2.46s/it]
100%|ββββββββββ| 12/12 [00:30<00:00, 2.54s/it]
100%|ββββββββββ| 12/12 [00:30<00:00, 2.56s/it] |
|
Proceeding 31-length images samples | Num: 11 |
|
0%| | 0/12 [00:00<?, ?it/s]
8%|β | 1/12 [00:01<00:18, 1.67s/it]
17%|ββ | 2/12 [00:04<00:20, 2.07s/it]
25%|βββ | 3/12 [00:04<00:13, 1.55s/it]
33%|ββββ | 4/12 [00:07<00:15, 1.96s/it]
42%|βββββ | 5/12 [00:10<00:17, 2.45s/it]
50%|βββββ | 6/12 [00:14<00:16, 2.74s/it]
58%|ββββββ | 7/12 [00:17<00:14, 2.99s/it]
67%|βββββββ | 8/12 [00:20<00:11, 2.93s/it]
75%|ββββββββ | 9/12 [00:24<00:09, 3.13s/it]
83%|βββββββββ | 10/12 [00:27<00:06, 3.20s/it]
92%|ββββββββββ| 11/12 [00:28<00:02, 2.45s/it]
100%|ββββββββββ| 12/12 [00:30<00:00, 2.48s/it]
100%|ββββββββββ| 12/12 [00:30<00:00, 2.56s/it] |
|
Proceeding 31-length images samples | Num: 11 |
|
0%| | 0/12 [00:00<?, ?it/s]
8%|β | 1/12 [00:03<00:33, 3.03s/it]
17%|ββ | 2/12 [00:04<00:18, 1.86s/it]
25%|βββ | 3/12 [00:07<00:21, 2.44s/it]
33%|ββββ | 4/12 [00:10<00:22, 2.79s/it]
42%|βββββ | 5/12 [00:13<00:20, 2.93s/it]
50%|βββββ | 6/12 [00:15<00:14, 2.49s/it]
58%|ββββββ | 7/12 [00:17<00:12, 2.40s/it]
67%|βββββββ | 8/12 [00:20<00:10, 2.52s/it]
75%|ββββββββ | 9/12 [00:23<00:08, 2.84s/it]
83%|βββββββββ | 10/12 [00:27<00:06, 3.09s/it]
92%|ββββββββββ| 11/12 [00:30<00:02, 2.97s/it]
100%|ββββββββββ| 12/12 [00:31<00:00, 2.34s/it]
100%|ββββββββββ| 12/12 [00:31<00:00, 2.60s/it] |
|
Proceeding 31-length images samples | Num: 11 |
|
Proceeding 31-length images samples | Num: 11 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.46s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.57s/it]
100%|ββββββββββ| 2/2 [00:03<00:00, 1.60s/it] |
|
Proceeding 44-length images samples | Num: 5 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:03, 1.97s/it]
67%|βββββββ | 2/3 [00:03<00:01, 1.58s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 2.27s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 2.16s/it] |
|
Proceeding 44-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.48s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.57s/it] |
|
Proceeding 34-length images samples | Num: 8 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:04, 2.32s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.79s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 1.90s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 2.13s/it] |
|
Proceeding 44-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.76s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.85s/it] |
|
Proceeding 34-length images samples | Num: 8 |
|
Proceeding 44-length images samples | Num: 5 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.18s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.27s/it] |
|
Proceeding 34-length images samples | Num: 8 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:04<00:04, 4.18s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.10s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.30s/it] |
|
Proceeding 35-length images samples | Num: 6 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.94s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.63s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.72s/it] |
|
Proceeding 35-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.92s/it]
100%|ββββββββββ| 1/1 [00:05<00:00, 5.02s/it] |
|
Proceeding 17-length images samples | Num: 4 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.35s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.70s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.54s/it] |
|
Proceeding 35-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.77s/it]
100%|ββββββββββ| 1/1 [00:04<00:00, 4.86s/it] |
|
Proceeding 17-length images samples | Num: 4 |
|
Proceeding 34-length images samples | Num: 8 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.54s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.62s/it] |
|
Proceeding 26-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 29-length images samples | Num: 12 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.11s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.18s/it] |
|
Proceeding 26-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 29-length images samples | Num: 12 |
|
Proceeding 35-length images samples | Num: 6 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.68s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.11s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.09s/it] |
|
Proceeding 17-length images samples | Num: 4 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:02, 1.04s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.27s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.87s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.88s/it] |
|
Proceeding 21-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.33s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.41s/it] |
|
Proceeding 26-length images samples | Num: 2 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:03<00:06, 3.48s/it]
67%|βββββββ | 2/3 [00:04<00:02, 2.29s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.60s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.93s/it] |
|
Proceeding 21-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.25s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.35s/it] |
|
Proceeding 16-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.45s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.54s/it] |
|
Proceeding 29-length images samples | Num: 12 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 33-length images samples | Num: 10 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.14it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.03it/s] |
|
Proceeding 16-length images samples | Num: 2 |
|
Proceeding 17-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 33-length images samples | Num: 10 |
|
Proceeding 26-length images samples | Num: 2 |
|
Proceeding 29-length images samples | Num: 12 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.72s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.93s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.96s/it] |
|
Proceeding 20-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:02<00:02, 2.41s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.71s/it]
100%|ββββββββββ| 2/2 [00:05<00:00, 2.72s/it] |
|
Proceeding 20-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 8 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 8 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:02<00:05, 2.74s/it]
67%|βββββββ | 2/3 [00:05<00:02, 2.84s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 1.88s/it]
100%|ββββββββββ| 3/3 [00:06<00:00, 2.16s/it] |
|
Proceeding 21-length images samples | Num: 6 |
|
Proceeding 16-length images samples | Num: 2 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.56s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.13s/it]
100%|ββββββββββ| 2/2 [00:04<00:00, 2.38s/it] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.70s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.12s/it]
100%|ββββββββββ| 2/2 [00:06<00:00, 3.25s/it] |
|
Proceeding 37-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.01s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.09s/it] |
|
Proceeding 33-length images samples | Num: 10 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:03<00:03, 3.76s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.47s/it]
100%|ββββββββββ| 2/2 [00:07<00:00, 3.55s/it] |
|
Proceeding 37-length images samples | Num: 6 |
|
Proceeding 21-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.17s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.25s/it] |
|
Proceeding 38-length images samples | Num: 4 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.87s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.95s/it] |
|
Proceeding 38-length images samples | Num: 4 |
|
Proceeding 16-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.61s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.71s/it] |
|
Proceeding 28-length images samples | Num: 7 |
|
Proceeding 33-length images samples | Num: 10 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.05s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.14s/it] |
|
Proceeding 36-length images samples | Num: 2 |
|
0%| | 0/3 [00:00<?, ?it/s]
33%|ββββ | 1/3 [00:01<00:03, 1.56s/it]
67%|βββββββ | 2/3 [00:02<00:01, 1.27s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 2.07s/it]
100%|ββββββββββ| 3/3 [00:05<00:00, 1.92s/it] |
|
Proceeding 20-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.51s/it]
100%|ββββββββββ| 1/1 [00:03<00:00, 3.62s/it] |
|
Proceeding 28-length images samples | Num: 7 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 59-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 27-length images samples | Num: 8 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 47-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 25-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.05s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.13s/it] |
|
Proceeding 39-length images samples | Num: 4 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.79s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.18s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.31s/it] |
|
Proceeding 36-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 59-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.84s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.22s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.36s/it] |
|
Proceeding 37-length images samples | Num: 6 |
|
Proceeding 20-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 47-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.27s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.36s/it] |
|
Proceeding 15-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 25-length images samples | Num: 6 |
|
Proceeding 27-length images samples | Num: 8 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 18-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 40-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 50-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.05s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.13s/it] |
|
Proceeding 39-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 73-length images samples | Num: 1 |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.34s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.23s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.30s/it] |
|
Proceeding 38-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 60-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 45-length images samples | Num: 1 |
|
Proceeding 37-length images samples | Num: 6 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 62-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.34s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.43s/it] |
|
Proceeding 15-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 58-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 18-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 70-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.54s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.62s/it] |
|
Proceeding 28-length images samples | Num: 7 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 52-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.33it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.21it/s] |
|
Proceeding 40-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 23-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 50-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 49-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 73-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 7-length images samples | Num: 1 |
|
Proceeding 38-length images samples | Num: 4 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 60-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.35s/it]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.02it/s]
100%|ββββββββββ| 2/2 [00:02<00:00, 1.09s/it] |
|
Proceeding 36-length images samples | Num: 2 |
|
Proceeding 28-length images samples | Num: 7 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.22s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.30s/it] |
|
Proceeding 59-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.08s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.17s/it] |
|
Proceeding 45-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 47-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 62-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 24-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 58-length images samples | Num: 1 |
|
Proceeding 36-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 70-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 52-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.06it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.03s/it] |
|
Proceeding 25-length images samples | Num: 6 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 23-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 49-length images samples | Num: 1 |
|
Proceeding 59-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 7-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
0%| | 0/2 [00:00<?, ?it/s]
50%|βββββ | 1/2 [00:01<00:01, 1.07s/it]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.16it/s]
100%|ββββββββββ| 2/2 [00:01<00:00, 1.06it/s] |
|
Proceeding 39-length images samples | Num: 4 |
|
Proceeding 47-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.30s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.40s/it] |
|
Proceeding 15-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 18-length images samples | Num: 3 |
|
Proceeding 24-length images samples | Num: 2 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.28it/s]
100%|ββββββββββ| 1/1 [00:00<00:00, 1.16it/s] |
|
Proceeding 40-length images samples | Num: 2 |
|
Proceeding 25-length images samples | Num: 6 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.37s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.47s/it] |
|
Proceeding 50-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 73-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 60-length images samples | Num: 3 |
|
Proceeding 39-length images samples | Num: 4 |
|
Proceeding 15-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.05s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.12s/it] |
|
Proceeding 45-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 62-length images samples | Num: 1 |
|
Proceeding 18-length images samples | Num: 3 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 58-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 70-length images samples | Num: 1 |
|
Proceeding 40-length images samples | Num: 2 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 52-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 23-length images samples | Num: 2 |
|
Proceeding 50-length images samples | Num: 1 |
|
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.22s/it]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.31s/it] |
|
Proceeding 49-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 7-length images samples | Num: 1 |
|
0it [00:00, ?it/s]
0it [00:00, ?it/s] |
|
Proceeding 73-length images samples | Num: 1 |
|
Proceeding 60-length images samples | Num: 3 |
|
Proceeding 45-length images samples | Num: 1 |
|
Proceeding 62-length images samples | Num: 1 |
|
Proceeding 58-length images samples | Num: 1 |
|
Proceeding 70-length images samples | Num: 1 |
|
Proceeding 52-length images samples | Num: 1 |
|
Proceeding 23-length images samples | Num: 2 |
|
Proceeding 49-length images samples | Num: 1 |
|
Proceeding 7-length images samples | Num: 1 |
|
evaluating ObjectInteraction ... |
|
Results saved to work_dirs/share_internvl/InternVL2-2B/eval_milebench/ObjectInteraction/ObjectInteraction_240803234851.json |
|
python eval/milebench/evaluate.py --data-dir /mnt/inspurfs/share_data/wangweiyun/share_data/long-context-benchmark/MileBench/datasets--FreedomIntelligence--MileBench/snapshots/53c7a58051ef88bacf76541d91f03f5ba2d71e7d --dataset ObjectInteraction --result-dir work_dirs/share_internvl/InternVL2-2B/eval_milebench/ObjectInteraction |
|
internvl: ObjectInteraction: {'Accuracy': 0.74, 'image_quantity_level-Accuracy': {'Few': 0, 'Medium': 0.7661290322580645, 'Many': 0.6973684210526315}, 'image_quantity_level-Result': {'Few': [0, 0], 'Medium': [95, 124], 'Many': [53, 76]}} |
|
|