--- license: mit language: - en metrics: - accuracy pipeline_tag: video-classification tags: - robotics --- drawing University of Technology Chemnitz, Germany
Department Robotics and Human Machine Interaction
Author: Robert Schulz

Action Recognition

Table of Contents

- [1 Overview](#1-overview) - [2 Pretrained Models](#2-pretrained-models) - [2.1 TUC-AR Dataset](#21-tuc-ar-dataset) - [2.2 UCF101 Dataset](#22-ucf101-dataset) ## 1 Overview Here, we provide a PyTorch model which was trained on different datasets (see [2 Pretrained Models](#2-pretrained-models)). The model consists of a 3D CNN multi-stage feature extraction module, followed by a classification head. It achieves state-of-the-art results on the UCF101 dataset. ![](image/model_architecture.png) _**Figure 1** Model architecture_ ## 2 Pretrained Models ### 2.1 TUC-AR Dataset [Dataset Homepage](https://huggingface.co/datasets/SchulzR97/TUC-AR) **Short Description** - RGB and depth input recorded by Intel RealSense D435 depth camera - 7 subjects - 3 perspectives per sequence - 11,031 sequences (train 8,893/ val 2,138) - 6(+1) action categories **Input** | Dimension | Fixed | Value | Parameter | Description | |-----------|---------|-------|-----------------|-------------------------------------------| | 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) | | 1 | yes | 30 | Sequence Length | Number of frames in one sequence | | 2 | yes | 4 | Input Channels | Number of channels of one frame (RGB+D=4) | | 3 | yes | 400 | Width | Width of one frame | | 4 | yes | 400 | Height | Height of one frame | **Output** | Dimension | Fixed | Value | Parameter | Description | |-----------|---------|-------|-----------------|-------------------------------------------| | 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) | | 1 | yes | 10 | Number of action classes | Number of action classes
0 - None
1 - Waving
2 - Pointing
3 - Clapping
4 - Follow
5 - Walking
6 - Stop | **Usage** ```python from huggingface_hub import HfApi api = HfApi() model_path = api.hf_hub_download('SchulzR97/TUC-AR-C3D', filename='tuc-ar.pth') model = torch.load(model_path) ``` ### 2.2 UCF101 Dataset [Dataset Homepage](https://www.crcv.ucf.edu/data/UCF101.php) **Input** | Dimension | Fixed | Value | Parameter | Description | |-----------|---------|-------|-----------------|-------------------------------------------| | 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) | | 1 | yes | 60 | Sequence Length | Number of frames in one sequence | | 2 | yes | 3 | Input Channels | Number of channels of one frame (RGB=3) | | 3 | yes | 400 | Width | Width of one frame | | 4 | yes | 400 | Height | Height of one frame | **Output** | Dimension | Fixed | Value | Parameter | Description | |-----------|---------|-------|-----------------|-------------------------------------------| | 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) | | 1 | yes | 101 | Number of action classes | Number of action classes | **Usage** ```python from huggingface_hub import HfApi api = HfApi() model_path = api.hf_hub_download('SchulzR97/TUC-AR-C3D', filename='ucf101.pth') model = torch.load(model_path) ```