CLIP-Based Break Dance Move Classifier

A deep learning model for classifying break dance moves using CLIP (Contrastive Language-Image Pre-Training) embeddings. The model is fine-tuned on break dance videos to classify different power moves including windmills, halos, swipes, and baby mills.

Features

  • Video-based classification using CLIP embeddings
  • Multi-frame temporal analysis
  • Configurable frame sampling and data augmentation
  • Real-time inference using Cog
  • Misclassification analysis tools
  • Hyperparameter tuning support

Setup

# Install dependencies
pip install -r requirements.txt

# Install Cog (if not already installed)
curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
chmod +x /usr/local/bin/cog

Cog

download the weights

gdown https://drive.google.com/uc?id=1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf -O ./checkpoints/

build the image

cog build --separate-weights

push a new image

cog push

Training

download the training data

gdown https://drive.google.com/uc?id=11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link -O ./data/
# Run training with default configuration
python scripts/train.py

# Run hyperparameter tuning
python scripts/hyperparameter_tuning.py

Inference

# Using Cog for inference
cog predict -i video=@path/to/your/video.mp4

# Using standard Python script
python scripts/inference.py --video path/to/your/video.mp4

Analysis

# Generate misclassification report
python scripts/visualization/miscalculations_report.py

# Visualize model performance
python scripts/visualization/visualize.py

Project Structure

clip/
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ data/              # Dataset and data processing
β”‚   β”œβ”€β”€ models/            # Model architecture
β”‚   └── utils/             # Utility functions
β”œβ”€β”€ scripts/               # Training and inference scripts
β”‚   └── visualization/     # Visualization tools
β”œβ”€β”€ config/                # Configuration files
β”œβ”€β”€ runs/                  # Training runs and checkpoints
β”œβ”€β”€ cog.yaml              # Cog configuration
└── requirements.txt      # Python dependencies

Training Data

To run training on your own, you can find the training data here and put it in the a directory at the root of the project called ./data.

Checkpoints

To run predictions with cog or locally on an existing checkpoint, you can find a checkpoint and configuration files here and put them in the a directory at the root of the project called ./checkpoints.

Model Architecture

  • Base: CLIP ViT-Large/14
  • Custom temporal pooling layer
  • Fine-tuned vision encoder (last 3 layers)
  • Output: 4-class classifier

License

MIT License

Copyright (c) 2024 Bryant Wolf

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this model in your research, please cite:

@misc{clip-breakdance-classifier,
  author = {Bryant Wolf},
  title = {CLIP-Based Break Dance Move Classifier},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://github.com/bawolf/breaking_vision_clip_cog}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for bawolf/breaking-vision-clip-classifier

Finetuned
(61)
this model