metadata

title: Video Anomaly Detector
emoji: 🎥
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit

Video Anomaly Detector

This application analyzes video files frame by frame using advanced AI models to detect anomalies based on a user-provided prompt.

Model Description

The application supports multiple AI models for analysis:

GPT-4o: OpenAI's most powerful multimodal model, offering the highest accuracy for anomaly detection
GPT-4o-mini: A smaller, faster, and more cost-effective version of GPT-4o
Phi-4: Microsoft's multimodal model that can run locally using Hugging Face transformers
Phi-3: (Coming soon) Microsoft's earlier multimodal model

Each model can analyze both text and images, examining video frames to identify potential anomalies based on the user's prompt.

Demo App

Features

Support for both video files and live streams (webcam, IP camera, RTSP)
Select from multiple AI models (GPT-4o, GPT-4o-mini, Phi-4)
Skip frames for faster processing
Provide custom prompts for anomaly detection
Two analysis modes: frame-by-frame or cumulative summary
Batch processing for multiple videos
Streamlit web interface with modern UI design

How It Works

The application extracts frames from the uploaded video or live stream
It skips a user-defined number of frames to reduce processing time
Based on the selected analysis depth:
- Granular mode: Each selected frame is analyzed individually
- Cumulative mode: All frames are analyzed together to provide an overall summary
The selected AI model analyzes the frame(s) and provides descriptions of any detected anomalies
Results are displayed in an interactive interface with timestamps for live streams

Requirements

Python 3.8+
OpenAI API key with access to GPT-4o and GPT-4o-mini models (only needed for OpenAI models)
For Phi-4: GPU recommended but not required (will use CPU if GPU not available)
For live streams: Webcam or access to an IP camera/RTSP stream

Installation

git clone https://github.com/username/video-anomaly-detector.git
cd video-anomaly-detector
pip install -r requirements.txt

Environment Variables

Create a .env file in the root directory with your OpenAI API key (only needed for OpenAI models):

OPENAI_API_KEY=your_openai_api_key_here

Usage

Web Application

Run the Streamlit application:

streamlit run app.py

Your browser will automatically open with the application running at http://localhost:8501

Using with Video Files

Select "Video File" as the input source
Upload a video file
Configure the analysis settings
Click "Analyze Video"

Using with Live Streams

Select "Live Stream" as the input source
Choose between "Webcam" or "IP Camera / RTSP Stream"
For IP cameras, enter the stream URL (e.g., rtsp://username:password@ip_address:port/path)
Set the maximum number of frames to process
Configure the analysis settings
Click "Analyze Video"

Command Line

Single Video Processing

python example.py --video path/to/video.mp4 --skip 5 --analysis_depth granular --model gpt-4o --prompt "Detect any unusual activities or objects in this frame"

Arguments:

--video: Path to the video file (required)
--skip: Number of frames to skip (default: 5)
--analysis_depth: Analysis depth: 'granular' or 'cumulative' (default: 'granular')
--model: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')
--prompt: Prompt for anomaly detection
--api_key: OpenAI API key (optional if set in .env file, not needed for Phi-4)

Live Stream Processing

python example.py --stream 0 --skip 5 --analysis_depth granular --model gpt-4o --max_frames 30 --prompt "Detect any unusual activities or objects in this frame"

Arguments:

--stream: Stream source (0 for webcam, URL for IP camera/RTSP stream)
--max_frames: Maximum number of frames to process (default: 30)
Other arguments are the same as for video processing

Batch Processing

python batch_process.py --videos_dir path/to/videos --output_dir output --skip 5 --analysis_depth cumulative --model gpt-4o-mini

Arguments:

--videos_dir: Directory containing video files (required)
--output_dir: Directory to save results (default: 'output')
--skip: Number of frames to skip (default: 5)
--analysis_depth: Analysis depth: 'granular' or 'cumulative' (default: 'granular')
--model: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')
--prompt: Prompt for anomaly detection
--api_key: OpenAI API key (optional if set in .env file, not needed for Phi-4)
--extensions: Comma-separated list of video file extensions (default: '.mp4,.avi,.mov,.mkv')

Model Options

GPT-4o

OpenAI's most powerful multimodal model
Highest accuracy for anomaly detection
Requires OpenAI API key
Recommended for critical applications where accuracy is paramount

GPT-4o-mini

Smaller, faster version of GPT-4o
More cost-effective for processing large videos
Requires OpenAI API key
Good balance between performance and cost

Phi-4

Microsoft's multimodal model
Runs locally using Hugging Face transformers
No API key required
First run will download the model (approximately 5GB)
GPU recommended but not required

Phi-3 (Coming Soon)

Microsoft's earlier multimodal model
Will provide an alternative option for analysis

Analysis Depth Options

Granular - Frame by Frame

Analyzes each frame individually
Provides detailed analysis for every processed frame
Useful for detecting specific moments or events

Cumulative - All Frames

Analyzes all frames together to provide an overall summary
Identifies up to 3 key frames that best represent detected anomalies
Useful for getting a high-level understanding of anomalies in the video

Deploying to Hugging Face Spaces

This project is configured for easy deployment to Hugging Face Spaces:

Fork this repository to your GitHub account
Create a new Space on Hugging Face: https://huggingface.co/spaces/create
Select "Streamlit" as the SDK
Link your GitHub repository
Add your OpenAI API key as a secret in the Space settings (if using OpenAI models)
The Space will automatically deploy with the configuration from this repository

Alternatively, you can use the GitHub Actions workflow to automatically sync your repository to Hugging Face Spaces:

Create a Hugging Face account and generate an access token
Add the following secrets to your GitHub repository:
- HF_TOKEN: Your Hugging Face access token
- HF_USERNAME: Your Hugging Face username
- OPENAI_API_KEY: Your OpenAI API key (if using OpenAI models)
Push to the main branch to trigger the workflow

Project Structure

video-anomaly-detector/
├── app.py                # Streamlit web application
├── detector.py           # Core video processing and anomaly detection with OpenAI models
├── phi4_detector.py      # Phi-4 model implementation using Hugging Face
├── example.py            # Example script for processing a single video
├── batch_process.py      # Script for batch processing multiple videos
├── requirements.txt      # Python dependencies
├── requirements-hf.txt   # Dependencies for Hugging Face Spaces
├── .env.example          # Template for environment variables
└── .github/              # GitHub Actions workflows
    └── workflows/
        └── sync-to-hub.yml  # Workflow to sync to Hugging Face

Limitations

Processing time depends on the video length, frame skip rate, and your internet connection
The OpenAI models require an API key and may incur usage costs
Phi-4 model requires downloading approximately 5GB of model files on first use
Higher frame skip values will process fewer frames, making analysis faster but potentially less accurate
Cumulative analysis may miss some details that would be caught in granular analysis
Live stream processing may be affected by network latency and camera quality

License

This project is licensed under the MIT License - see the LICENSE file for details.