|
---
|
|
title: Video Anomaly Detector
|
|
emoji: π₯
|
|
colorFrom: blue
|
|
colorTo: green
|
|
sdk: streamlit
|
|
sdk_version: 1.31.0
|
|
app_file: app.py
|
|
pinned: false
|
|
license: mit
|
|
---
|
|
|
|
# Video Anomaly Detector
|
|
|
|
This application analyzes video files frame by frame using advanced AI models to detect anomalies based on a user-provided prompt.
|
|
|
|
## Model Description
|
|
|
|
The application supports multiple AI models for analysis:
|
|
|
|
- **GPT-4o**: OpenAI's most powerful multimodal model, offering the highest accuracy for anomaly detection
|
|
- **GPT-4o-mini**: A smaller, faster, and more cost-effective version of GPT-4o
|
|
- **Phi-4**: Microsoft's multimodal model that can run locally using Hugging Face transformers
|
|
- **Phi-3**: *(Coming soon)* Microsoft's earlier multimodal model
|
|
|
|
Each model can analyze both text and images, examining video frames to identify potential anomalies based on the user's prompt.
|
|
|
|
## Demo App
|
|
|
|
[](https://huggingface.co/spaces/username/video-anomaly-detector)
|
|
|
|
## Features
|
|
|
|
- Support for both video files and live streams (webcam, IP camera, RTSP)
|
|
- Select from multiple AI models (GPT-4o, GPT-4o-mini, Phi-4)
|
|
- Skip frames for faster processing
|
|
- Provide custom prompts for anomaly detection
|
|
- Two analysis modes: frame-by-frame or cumulative summary
|
|
- Batch processing for multiple videos
|
|
- Streamlit web interface with modern UI design
|
|
|
|
## How It Works
|
|
|
|
1. The application extracts frames from the uploaded video or live stream
|
|
2. It skips a user-defined number of frames to reduce processing time
|
|
3. Based on the selected analysis depth:
|
|
- **Granular mode**: Each selected frame is analyzed individually
|
|
- **Cumulative mode**: All frames are analyzed together to provide an overall summary
|
|
4. The selected AI model analyzes the frame(s) and provides descriptions of any detected anomalies
|
|
5. Results are displayed in an interactive interface with timestamps for live streams
|
|
|
|
## Requirements
|
|
|
|
- Python 3.8+
|
|
- OpenAI API key with access to GPT-4o and GPT-4o-mini models (only needed for OpenAI models)
|
|
- For Phi-4: GPU recommended but not required (will use CPU if GPU not available)
|
|
- For live streams: Webcam or access to an IP camera/RTSP stream
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
git clone https://github.com/username/video-anomaly-detector.git
|
|
cd video-anomaly-detector
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
Create a `.env` file in the root directory with your OpenAI API key (only needed for OpenAI models):
|
|
|
|
```
|
|
OPENAI_API_KEY=your_openai_api_key_here
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Web Application
|
|
|
|
Run the Streamlit application:
|
|
|
|
```bash
|
|
streamlit run app.py
|
|
```
|
|
|
|
Your browser will automatically open with the application running at http://localhost:8501
|
|
|
|
#### Using with Video Files
|
|
|
|
1. Select "Video File" as the input source
|
|
2. Upload a video file
|
|
3. Configure the analysis settings
|
|
4. Click "Analyze Video"
|
|
|
|
#### Using with Live Streams
|
|
|
|
1. Select "Live Stream" as the input source
|
|
2. Choose between "Webcam" or "IP Camera / RTSP Stream"
|
|
3. For IP cameras, enter the stream URL (e.g., rtsp://username:password@ip_address:port/path)
|
|
4. Set the maximum number of frames to process
|
|
5. Configure the analysis settings
|
|
6. Click "Analyze Video"
|
|
|
|
### Command Line
|
|
|
|
#### Single Video Processing
|
|
|
|
```bash
|
|
python example.py --video path/to/video.mp4 --skip 5 --analysis_depth granular --model gpt-4o --prompt "Detect any unusual activities or objects in this frame"
|
|
```
|
|
|
|
Arguments:
|
|
- `--video`: Path to the video file (required)
|
|
- `--skip`: Number of frames to skip (default: 5)
|
|
- `--analysis_depth`: Analysis depth: 'granular' or 'cumulative' (default: 'granular')
|
|
- `--model`: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')
|
|
- `--prompt`: Prompt for anomaly detection
|
|
- `--api_key`: OpenAI API key (optional if set in .env file, not needed for Phi-4)
|
|
|
|
#### Live Stream Processing
|
|
|
|
```bash
|
|
python example.py --stream 0 --skip 5 --analysis_depth granular --model gpt-4o --max_frames 30 --prompt "Detect any unusual activities or objects in this frame"
|
|
```
|
|
|
|
Arguments:
|
|
- `--stream`: Stream source (0 for webcam, URL for IP camera/RTSP stream)
|
|
- `--max_frames`: Maximum number of frames to process (default: 30)
|
|
- Other arguments are the same as for video processing
|
|
|
|
#### Batch Processing
|
|
|
|
```bash
|
|
python batch_process.py --videos_dir path/to/videos --output_dir output --skip 5 --analysis_depth cumulative --model gpt-4o-mini
|
|
```
|
|
|
|
Arguments:
|
|
- `--videos_dir`: Directory containing video files (required)
|
|
- `--output_dir`: Directory to save results (default: 'output')
|
|
- `--skip`: Number of frames to skip (default: 5)
|
|
- `--analysis_depth`: Analysis depth: 'granular' or 'cumulative' (default: 'granular')
|
|
- `--model`: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')
|
|
- `--prompt`: Prompt for anomaly detection
|
|
- `--api_key`: OpenAI API key (optional if set in .env file, not needed for Phi-4)
|
|
- `--extensions`: Comma-separated list of video file extensions (default: '.mp4,.avi,.mov,.mkv')
|
|
|
|
## Model Options
|
|
|
|
### GPT-4o
|
|
- OpenAI's most powerful multimodal model
|
|
- Highest accuracy for anomaly detection
|
|
- Requires OpenAI API key
|
|
- Recommended for critical applications where accuracy is paramount
|
|
|
|
### GPT-4o-mini
|
|
- Smaller, faster version of GPT-4o
|
|
- More cost-effective for processing large videos
|
|
- Requires OpenAI API key
|
|
- Good balance between performance and cost
|
|
|
|
### Phi-4
|
|
- Microsoft's multimodal model
|
|
- Runs locally using Hugging Face transformers
|
|
- No API key required
|
|
- First run will download the model (approximately 5GB)
|
|
- GPU recommended but not required
|
|
|
|
### Phi-3 (Coming Soon)
|
|
- Microsoft's earlier multimodal model
|
|
- Will provide an alternative option for analysis
|
|
|
|
## Analysis Depth Options
|
|
|
|
### Granular - Frame by Frame
|
|
- Analyzes each frame individually
|
|
- Provides detailed analysis for every processed frame
|
|
- Useful for detecting specific moments or events
|
|
|
|
### Cumulative - All Frames
|
|
- Analyzes all frames together to provide an overall summary
|
|
- Identifies up to 3 key frames that best represent detected anomalies
|
|
- Useful for getting a high-level understanding of anomalies in the video
|
|
|
|
## Deploying to Hugging Face Spaces
|
|
|
|
This project is configured for easy deployment to Hugging Face Spaces:
|
|
|
|
1. Fork this repository to your GitHub account
|
|
2. Create a new Space on Hugging Face: https://huggingface.co/spaces/create
|
|
3. Select "Streamlit" as the SDK
|
|
4. Link your GitHub repository
|
|
5. Add your OpenAI API key as a secret in the Space settings (if using OpenAI models)
|
|
6. The Space will automatically deploy with the configuration from this repository
|
|
|
|
Alternatively, you can use the GitHub Actions workflow to automatically sync your repository to Hugging Face Spaces:
|
|
|
|
1. Create a Hugging Face account and generate an access token
|
|
2. Add the following secrets to your GitHub repository:
|
|
- `HF_TOKEN`: Your Hugging Face access token
|
|
- `HF_USERNAME`: Your Hugging Face username
|
|
- `OPENAI_API_KEY`: Your OpenAI API key (if using OpenAI models)
|
|
3. Push to the main branch to trigger the workflow
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
video-anomaly-detector/
|
|
βββ app.py # Streamlit web application
|
|
βββ detector.py # Core video processing and anomaly detection with OpenAI models
|
|
βββ phi4_detector.py # Phi-4 model implementation using Hugging Face
|
|
βββ example.py # Example script for processing a single video
|
|
βββ batch_process.py # Script for batch processing multiple videos
|
|
βββ requirements.txt # Python dependencies
|
|
βββ requirements-hf.txt # Dependencies for Hugging Face Spaces
|
|
βββ .env.example # Template for environment variables
|
|
βββ .github/ # GitHub Actions workflows
|
|
βββ workflows/
|
|
βββ sync-to-hub.yml # Workflow to sync to Hugging Face
|
|
```
|
|
|
|
## Limitations
|
|
|
|
- Processing time depends on the video length, frame skip rate, and your internet connection
|
|
- The OpenAI models require an API key and may incur usage costs
|
|
- Phi-4 model requires downloading approximately 5GB of model files on first use
|
|
- Higher frame skip values will process fewer frames, making analysis faster but potentially less accurate
|
|
- Cumulative analysis may miss some details that would be caught in granular analysis
|
|
- Live stream processing may be affected by network latency and camera quality
|
|
|
|
## License
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |