OpenGVLab
/

InternVideo2-Stage2_6B

Video Classification

Model card Files Files and versions

InternVideo2-Stage2_6B / README.md

lixinhao's picture

Upload folder using huggingface_hub

60c8e7f verified 9 months ago

|

history blame contribute delete

1.51 kB

	---
	license: mit
	pipeline_tag: video-classification
	---

	## Introduction

	This repository contains the 6B model of the paper [InternVideo2](https://arxiv.org/pdf/2403.15377) in stage 2.

	Code: https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality

	## 🚀 Installation

	Please refer to https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/INSTALL.md

	## Usage

	```python
	import cv2
	from transformers import AutoModel
	from modeling_internvideo2 import (retrieve_text, vid2tensor, _frame_from_video,)


	model = AutoModel.from_pretrained("OpenGVLab/InternVideo2-Stage2_6B", trust_remote_code=True).eval()

	video = cv2.VideoCapture('example1.mp4')
	frames = [x for x in _frame_from_video(video)]
	text_candidates = ["A playful dog and its owner wrestle in the snowy yard, chasing each other with joyous abandon.",
	"A man in a gray coat walks through the snowy landscape, pulling a sleigh loaded with toys.",
	"A person dressed in a blue jacket shovels the snow-covered pavement outside their house.",
	"A cat excitedly runs through the yard, chasing a rabbit.",
	"A person bundled up in a blanket walks through the snowy landscape, enjoying the serene winter scenery."]

	texts, probs = retrieve_text(frames, text_candidates, model=model, topk=5)
	for t, p in zip(texts, probs):
	print(f'text: {t} ~ prob: {p:.4f}')

	vidtensor = vid2tensor('example1.mp4', fnum=4)
	feat = model.get_vid_feat(vidtensor)
	```