File size: 8,733 Bytes
1cc174f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
---
title: Video Anomaly Detector
emoji: π₯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---
# Video Anomaly Detector
This application analyzes video files frame by frame using advanced AI models to detect anomalies based on a user-provided prompt.
## Model Description
The application supports multiple AI models for analysis:
- **GPT-4o**: OpenAI's most powerful multimodal model, offering the highest accuracy for anomaly detection
- **GPT-4o-mini**: A smaller, faster, and more cost-effective version of GPT-4o
- **Phi-4**: Microsoft's multimodal model that can run locally using Hugging Face transformers
- **Phi-3**: *(Coming soon)* Microsoft's earlier multimodal model
Each model can analyze both text and images, examining video frames to identify potential anomalies based on the user's prompt.
## Demo App
[](https://huggingface.co/spaces/username/video-anomaly-detector)
## Features
- Support for both video files and live streams (webcam, IP camera, RTSP)
- Select from multiple AI models (GPT-4o, GPT-4o-mini, Phi-4)
- Skip frames for faster processing
- Provide custom prompts for anomaly detection
- Two analysis modes: frame-by-frame or cumulative summary
- Batch processing for multiple videos
- Streamlit web interface with modern UI design
## How It Works
1. The application extracts frames from the uploaded video or live stream
2. It skips a user-defined number of frames to reduce processing time
3. Based on the selected analysis depth:
- **Granular mode**: Each selected frame is analyzed individually
- **Cumulative mode**: All frames are analyzed together to provide an overall summary
4. The selected AI model analyzes the frame(s) and provides descriptions of any detected anomalies
5. Results are displayed in an interactive interface with timestamps for live streams
## Requirements
- Python 3.8+
- OpenAI API key with access to GPT-4o and GPT-4o-mini models (only needed for OpenAI models)
- For Phi-4: GPU recommended but not required (will use CPU if GPU not available)
- For live streams: Webcam or access to an IP camera/RTSP stream
## Installation
```bash
git clone https://github.com/username/video-anomaly-detector.git
cd video-anomaly-detector
pip install -r requirements.txt
```
## Environment Variables
Create a `.env` file in the root directory with your OpenAI API key (only needed for OpenAI models):
```
OPENAI_API_KEY=your_openai_api_key_here
```
## Usage
### Web Application
Run the Streamlit application:
```bash
streamlit run app.py
```
Your browser will automatically open with the application running at http://localhost:8501
#### Using with Video Files
1. Select "Video File" as the input source
2. Upload a video file
3. Configure the analysis settings
4. Click "Analyze Video"
#### Using with Live Streams
1. Select "Live Stream" as the input source
2. Choose between "Webcam" or "IP Camera / RTSP Stream"
3. For IP cameras, enter the stream URL (e.g., rtsp://username:password@ip_address:port/path)
4. Set the maximum number of frames to process
5. Configure the analysis settings
6. Click "Analyze Video"
### Command Line
#### Single Video Processing
```bash
python example.py --video path/to/video.mp4 --skip 5 --analysis_depth granular --model gpt-4o --prompt "Detect any unusual activities or objects in this frame"
```
Arguments:
- `--video`: Path to the video file (required)
- `--skip`: Number of frames to skip (default: 5)
- `--analysis_depth`: Analysis depth: 'granular' or 'cumulative' (default: 'granular')
- `--model`: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')
- `--prompt`: Prompt for anomaly detection
- `--api_key`: OpenAI API key (optional if set in .env file, not needed for Phi-4)
#### Live Stream Processing
```bash
python example.py --stream 0 --skip 5 --analysis_depth granular --model gpt-4o --max_frames 30 --prompt "Detect any unusual activities or objects in this frame"
```
Arguments:
- `--stream`: Stream source (0 for webcam, URL for IP camera/RTSP stream)
- `--max_frames`: Maximum number of frames to process (default: 30)
- Other arguments are the same as for video processing
#### Batch Processing
```bash
python batch_process.py --videos_dir path/to/videos --output_dir output --skip 5 --analysis_depth cumulative --model gpt-4o-mini
```
Arguments:
- `--videos_dir`: Directory containing video files (required)
- `--output_dir`: Directory to save results (default: 'output')
- `--skip`: Number of frames to skip (default: 5)
- `--analysis_depth`: Analysis depth: 'granular' or 'cumulative' (default: 'granular')
- `--model`: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')
- `--prompt`: Prompt for anomaly detection
- `--api_key`: OpenAI API key (optional if set in .env file, not needed for Phi-4)
- `--extensions`: Comma-separated list of video file extensions (default: '.mp4,.avi,.mov,.mkv')
## Model Options
### GPT-4o
- OpenAI's most powerful multimodal model
- Highest accuracy for anomaly detection
- Requires OpenAI API key
- Recommended for critical applications where accuracy is paramount
### GPT-4o-mini
- Smaller, faster version of GPT-4o
- More cost-effective for processing large videos
- Requires OpenAI API key
- Good balance between performance and cost
### Phi-4
- Microsoft's multimodal model
- Runs locally using Hugging Face transformers
- No API key required
- First run will download the model (approximately 5GB)
- GPU recommended but not required
### Phi-3 (Coming Soon)
- Microsoft's earlier multimodal model
- Will provide an alternative option for analysis
## Analysis Depth Options
### Granular - Frame by Frame
- Analyzes each frame individually
- Provides detailed analysis for every processed frame
- Useful for detecting specific moments or events
### Cumulative - All Frames
- Analyzes all frames together to provide an overall summary
- Identifies up to 3 key frames that best represent detected anomalies
- Useful for getting a high-level understanding of anomalies in the video
## Deploying to Hugging Face Spaces
This project is configured for easy deployment to Hugging Face Spaces:
1. Fork this repository to your GitHub account
2. Create a new Space on Hugging Face: https://huggingface.co/spaces/create
3. Select "Streamlit" as the SDK
4. Link your GitHub repository
5. Add your OpenAI API key as a secret in the Space settings (if using OpenAI models)
6. The Space will automatically deploy with the configuration from this repository
Alternatively, you can use the GitHub Actions workflow to automatically sync your repository to Hugging Face Spaces:
1. Create a Hugging Face account and generate an access token
2. Add the following secrets to your GitHub repository:
- `HF_TOKEN`: Your Hugging Face access token
- `HF_USERNAME`: Your Hugging Face username
- `OPENAI_API_KEY`: Your OpenAI API key (if using OpenAI models)
3. Push to the main branch to trigger the workflow
## Project Structure
```
video-anomaly-detector/
βββ app.py # Streamlit web application
βββ detector.py # Core video processing and anomaly detection with OpenAI models
βββ phi4_detector.py # Phi-4 model implementation using Hugging Face
βββ example.py # Example script for processing a single video
βββ batch_process.py # Script for batch processing multiple videos
βββ requirements.txt # Python dependencies
βββ requirements-hf.txt # Dependencies for Hugging Face Spaces
βββ .env.example # Template for environment variables
βββ .github/ # GitHub Actions workflows
βββ workflows/
βββ sync-to-hub.yml # Workflow to sync to Hugging Face
```
## Limitations
- Processing time depends on the video length, frame skip rate, and your internet connection
- The OpenAI models require an API key and may incur usage costs
- Phi-4 model requires downloading approximately 5GB of model files on first use
- Higher frame skip values will process fewer frames, making analysis faster but potentially less accurate
- Cumulative analysis may miss some details that would be caught in granular analysis
- Live stream processing may be affected by network latency and camera quality
## License
This project is licensed under the MIT License - see the LICENSE file for details. |