|
--- |
|
license: cc-by-4.0 |
|
--- |
|
# ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video(ECCV2024) |
|
|
|
This repo is the official model checkpoints of ["ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video"](https://arxiv.org/abs/2310.01324)(ECCV2024) |
|
|
|
|
|
## Models |
|
|
|
We provide the checkpoints before reparameterization, you could reparameter the weight refer to `tools\weight_reparam.py` in our [codes](https://github.com/MCG-NJU/ZeroI2V/blob/main/tools/weight_reparam.py). |
|
### Kinetics 400 |
|
|
|
| Backbone | Pretrain | GFLOPs | Param | New Param (M) | acc@1 | Views |
|
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | |
|
| ViT-B/16 | CLIP | 422 | 86 | 0 | 83.0 | 8x1x3 | |
|
| ViT-L/14 | CLIP | 1946 | 304 | 0 | 86.3 | 8x1x3 | |
|
| ViT-L/14 | CLIP | 7783 | 304 | 0 | 87.2 | 32x1x3 | |
|
|
|
### Something Something V2 |
|
|
|
| Backbone | Pretrain | GFLOPs | Param | New Param (M) | acc@1 | Views | |
|
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | |
|
| ViT-L/14 | CLIP | 7783 | 304 | 0 | 72.2 | 32x3x1 | |
|
|
|
|
|
|
|
If you find our work useful in your research, please cite: |
|
``` |
|
@article{li2023zeroi2v, |
|
title={ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video}, |
|
author={Li, Xinhao and Zhu, Yuhan and Wang, Limin}, |
|
journal={arXiv preprint arXiv:2310.01324}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
|
|
|
|
|