Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
5.23.2
Model Zoo
Pretraining
For $\text{InternVideo2}{s2}$, we load those models of $\text{InternVideo2}{s1}$ and further pretrain them on multi-modality datasets.
For $\text{InternVideo2}{clip}$, we load those models of $\text{InternVideo2}{s2}$.
Model | Setting | Model | Pretraining Script |
---|---|---|---|
$\text{InternVideo2}_{s2}$-1B | IV-25.5M | :hugs: HF link | script |
$\text{InternVideo2}_{clip}$-1B | IV-25.5M | TBD | script |
$\text{InternVideo2}_{s2}$-6B | IV-400M | TBD | script |
$\text{InternVideo2}_{clip}$-6B | IV-400M | TBD | script |
Zero-shot Evaluation
Zero-Shot Video-Text Retrieval
Model | Dataset | T2V | V2T | Evaluation Script |
---|---|---|---|---|
$\text{InternVideo2}_{s2}$-1B | MSRVTT | 51.9 | 50.9 | script |
LSMDC | 32.0 | 27.3 | script | |
DiDeMo | 57.0 | 54.3 | script | |
MSVD | 58.1 | 83.3 | script | |
ANet | 60.4 | 54.8 | script | |
VATEX | 70.4 | 85.4 | script | |
$\text{InternVideo2}_{s2}$-6B | MSRVTT | 55.9 | 53.7 | TBD |
LSMDC | 33.8 | 30.1 | TBD | |
DiDeMo | 57.9 | 57.1 | TBD | |
MSVD | 59.3 | 83.1 | TBD | |
ANet | 63.2 | 56.5 | TBD | |
VATEX | 71.5 | 85.3 | TBD |
Model | Dataset | T2V | V2T | Evaluation Script |
---|---|---|---|---|
$\text{InternVideo2}_{clip}$-1B | MSRVTT | 50.0 | 48.4 | script |
LSMDC | 26.4 | 23.1 | script | |
DiDeMo | 47.8 | 46.4 | script | |
ANet | 49.4 | 46.2 | script | |
VATEX_en | 63.5 | 81.2 | script | |
VATEX_ch | 54.9 | 76.4 | script | |
$\text{InternVideo2}_{clip}$-6B | MSRVTT | 50.9 | 50.6 | script |
LSMDC | 29.4 | 26.3 | script | |
DiDeMo | 50.5 | 46.8 | script | |
ANet | 50.2 | 47.5 | script | |
VATEX_en | 64.1 | 82.6 | script | |
VATEX_ch | 54.6 | 76.9 | script |
Zero-Shot Action Recognition
Model | Dataset | top-1 | AVG | Script |
---|---|---|---|---|
$\text{InternVideo2}_{clip}$-1B | K400 | 73.1 | 82.4 | script |
K600 | 72.8 | 81.8 | script | |
K700 | 64.9 | 75.2 | script | |
UCF101 | 88.8 | - | script | |
HMDB51 | 53.9 | - | script | |
MiT | 31.6 | - | script | |
SSv2-MC | 61.5 | - | script | |
$\text{InternVideo2}_{clip}$-6B | K400 | 72.7 | 82.2 | script |
K600 | 71.7 | 81.2 | script | |
K700 | 64.2 | 75.2 | script | |
UCF101 | 89.5 | - | script | |
HMDB51 | 56.7 | - | script | |
MiT | 32.9 | - | script | |
SSv2-MC | 63.5 | - | script |