---
license: mit
language:
- en
metrics:
- accuracy
pipeline_tag: video-classification
tags:
- robotics
---
University of Technology Chemnitz, Germany
Department Robotics and Human Machine Interaction
Author: Robert Schulz
Action Recognition
Table of Contents
- [1 Overview](#1-overview)
- [2 Pretrained Models](#2-pretrained-models)
- [2.1 TUC-AR Dataset](#21-tuc-ar-dataset)
- [2.2 UCF101 Dataset](#22-ucf101-dataset)
## 1 Overview
Here, we provide a PyTorch model which was trained on different datasets (see [2 Pretrained Models](#2-pretrained-models)). The model consists of a 3D CNN multi-stage feature extraction module, followed by a classification head. It achieves state-of-the-art results on the UCF101 dataset.

_**Figure 1** Model architecture_
## 2 Pretrained Models
### 2.1 TUC-AR Dataset
[Dataset Homepage](https://huggingface.co/datasets/SchulzR97/TUC-AR)
**Short Description**
- RGB and depth input recorded by Intel RealSense D435 depth camera
- 7 subjects
- 3 perspectives per sequence
- 11,031 sequences (train 8,893/ val 2,138)
- 6(+1) action categories
**Input**
| Dimension | Fixed | Value | Parameter | Description |
|-----------|---------|-------|-----------------|-------------------------------------------|
| 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) |
| 1 | yes | 30 | Sequence Length | Number of frames in one sequence |
| 2 | yes | 4 | Input Channels | Number of channels of one frame (RGB+D=4) |
| 3 | yes | 400 | Width | Width of one frame |
| 4 | yes | 400 | Height | Height of one frame |
**Output**
| Dimension | Fixed | Value | Parameter | Description |
|-----------|---------|-------|-----------------|-------------------------------------------|
| 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) |
| 1 | yes | 10 | Number of action classes | Number of action classes
0 - None
1 - Waving
2 - Pointing
3 - Clapping
4 - Follow
5 - Walking
6 - Stop |
**Usage**
```python
from huggingface_hub import HfApi
api = HfApi()
model_path = api.hf_hub_download('SchulzR97/TUC-AR-C3D', filename='tuc-ar.pth')
model = torch.load(model_path)
```
### 2.2 UCF101 Dataset
[Dataset Homepage](https://www.crcv.ucf.edu/data/UCF101.php)
**Input**
| Dimension | Fixed | Value | Parameter | Description |
|-----------|---------|-------|-----------------|-------------------------------------------|
| 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) |
| 1 | yes | 60 | Sequence Length | Number of frames in one sequence |
| 2 | yes | 3 | Input Channels | Number of channels of one frame (RGB=3) |
| 3 | yes | 400 | Width | Width of one frame |
| 4 | yes | 400 | Height | Height of one frame |
**Output**
| Dimension | Fixed | Value | Parameter | Description |
|-----------|---------|-------|-----------------|-------------------------------------------|
| 0 | no | ? | Batch Size | Number of samples that will be propagated through the network (number of sequences) |
| 1 | yes | 101 | Number of action classes | Number of action classes |
**Usage**
```python
from huggingface_hub import HfApi
api = HfApi()
model_path = api.hf_hub_download('SchulzR97/TUC-AR-C3D', filename='ucf101.pth')
model = torch.load(model_path)
```