University of Technology Chemnitz, Germany
Department Robotics and Human Machine Interaction
Author: Robert Schulz

Action Recognition

1 Overview
2 Pretrained Models
- 2.1 TUC-AR Dataset
- 2.2 UCF101 Dataset

1 Overview

Here, we provide a PyTorch model which was trained on different datasets (see 2 Pretrained Models). The model consists of a 3D CNN multi-stage feature extraction module, followed by a classification head. It achieves state-of-the-art results on the UCF101 dataset.

Figure 1 Model architecture

2 Pretrained Models

2.1 TUC-AR Dataset

Dataset Homepage

Short Description

RGB and depth input recorded by Intel RealSense D435 depth camera
7 subjects
3 perspectives per sequence
11,031 sequences (train 8,893/ val 2,138)
6(+1) action categories

Input

Dimension	Fixed	Value	Parameter	Description
0	no	?	Batch Size	Number of samples that will be propagated through the network (number of sequences)
1	yes	30	Sequence Length	Number of frames in one sequence
2	yes	4	Input Channels	Number of channels of one frame (RGB+D=4)
3	yes	400	Width	Width of one frame
4	yes	400	Height	Height of one frame

Output

Dimension	Fixed	Value	Parameter	Description
0	no	?	Batch Size	Number of samples that will be propagated through the network (number of sequences)
1	yes	10	Number of action classes	Number of action classes 0 - None 1 - Waving 2 - Pointing 3 - Clapping 4 - Follow 5 - Walking 6 - Stop

Usage

from huggingface_hub import HfApi

api = HfApi()
model_path = api.hf_hub_download('SchulzR97/TUC-AR-C3D', filename='tuc-ar.pth')
model = torch.load(model_path)

2.2 UCF101 Dataset