A newer version of the Streamlit SDK is available:
1.43.2
title: Video Anomaly Detector
emoji: π₯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
Video Anomaly Detector
This application analyzes video files frame by frame using advanced AI models to detect anomalies based on a user-provided prompt.
Model Description
The application supports multiple AI models for analysis:
- GPT-4o: OpenAI's most powerful multimodal model, offering the highest accuracy for anomaly detection
- GPT-4o-mini: A smaller, faster, and more cost-effective version of GPT-4o
- Phi-4: Microsoft's multimodal model that can run locally using Hugging Face transformers
- Phi-3: (Coming soon) Microsoft's earlier multimodal model
Each model can analyze both text and images, examining video frames to identify potential anomalies based on the user's prompt.
Demo App
Features
- Support for both video files and live streams (webcam, IP camera, RTSP)
- Select from multiple AI models (GPT-4o, GPT-4o-mini, Phi-4)
- Skip frames for faster processing
- Provide custom prompts for anomaly detection
- Two analysis modes: frame-by-frame or cumulative summary
- Batch processing for multiple videos
- Streamlit web interface with modern UI design
How It Works
- The application extracts frames from the uploaded video or live stream
- It skips a user-defined number of frames to reduce processing time
- Based on the selected analysis depth:
- Granular mode: Each selected frame is analyzed individually
- Cumulative mode: All frames are analyzed together to provide an overall summary
- The selected AI model analyzes the frame(s) and provides descriptions of any detected anomalies
- Results are displayed in an interactive interface with timestamps for live streams
Requirements
- Python 3.8+
- OpenAI API key with access to GPT-4o and GPT-4o-mini models (only needed for OpenAI models)
- For Phi-4: GPU recommended but not required (will use CPU if GPU not available)
- For live streams: Webcam or access to an IP camera/RTSP stream
Installation
git clone https://github.com/username/video-anomaly-detector.git
cd video-anomaly-detector
pip install -r requirements.txt
Environment Variables
Create a .env
file in the root directory with your OpenAI API key (only needed for OpenAI models):
OPENAI_API_KEY=your_openai_api_key_here
Usage
Web Application
Run the Streamlit application:
streamlit run app.py
Your browser will automatically open with the application running at http://localhost:8501
Using with Video Files
- Select "Video File" as the input source
- Upload a video file
- Configure the analysis settings
- Click "Analyze Video"
Using with Live Streams
- Select "Live Stream" as the input source
- Choose between "Webcam" or "IP Camera / RTSP Stream"
- For IP cameras, enter the stream URL (e.g., rtsp://username:password@ip_address:port/path)
- Set the maximum number of frames to process
- Configure the analysis settings
- Click "Analyze Video"
Command Line
Single Video Processing
python example.py --video path/to/video.mp4 --skip 5 --analysis_depth granular --model gpt-4o --prompt "Detect any unusual activities or objects in this frame"
Arguments:
--video
: Path to the video file (required)--skip
: Number of frames to skip (default: 5)--analysis_depth
: Analysis depth: 'granular' or 'cumulative' (default: 'granular')--model
: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')--prompt
: Prompt for anomaly detection--api_key
: OpenAI API key (optional if set in .env file, not needed for Phi-4)
Live Stream Processing
python example.py --stream 0 --skip 5 --analysis_depth granular --model gpt-4o --max_frames 30 --prompt "Detect any unusual activities or objects in this frame"
Arguments:
--stream
: Stream source (0 for webcam, URL for IP camera/RTSP stream)--max_frames
: Maximum number of frames to process (default: 30)- Other arguments are the same as for video processing
Batch Processing
python batch_process.py --videos_dir path/to/videos --output_dir output --skip 5 --analysis_depth cumulative --model gpt-4o-mini
Arguments:
--videos_dir
: Directory containing video files (required)--output_dir
: Directory to save results (default: 'output')--skip
: Number of frames to skip (default: 5)--analysis_depth
: Analysis depth: 'granular' or 'cumulative' (default: 'granular')--model
: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')--prompt
: Prompt for anomaly detection--api_key
: OpenAI API key (optional if set in .env file, not needed for Phi-4)--extensions
: Comma-separated list of video file extensions (default: '.mp4,.avi,.mov,.mkv')
Model Options
GPT-4o
- OpenAI's most powerful multimodal model
- Highest accuracy for anomaly detection
- Requires OpenAI API key
- Recommended for critical applications where accuracy is paramount
GPT-4o-mini
- Smaller, faster version of GPT-4o
- More cost-effective for processing large videos
- Requires OpenAI API key
- Good balance between performance and cost
Phi-4
- Microsoft's multimodal model
- Runs locally using Hugging Face transformers
- No API key required
- First run will download the model (approximately 5GB)
- GPU recommended but not required
Phi-3 (Coming Soon)
- Microsoft's earlier multimodal model
- Will provide an alternative option for analysis
Analysis Depth Options
Granular - Frame by Frame
- Analyzes each frame individually
- Provides detailed analysis for every processed frame
- Useful for detecting specific moments or events
Cumulative - All Frames
- Analyzes all frames together to provide an overall summary
- Identifies up to 3 key frames that best represent detected anomalies
- Useful for getting a high-level understanding of anomalies in the video
Deploying to Hugging Face Spaces
This project is configured for easy deployment to Hugging Face Spaces:
- Fork this repository to your GitHub account
- Create a new Space on Hugging Face: https://huggingface.co/spaces/create
- Select "Streamlit" as the SDK
- Link your GitHub repository
- Add your OpenAI API key as a secret in the Space settings (if using OpenAI models)
- The Space will automatically deploy with the configuration from this repository
Alternatively, you can use the GitHub Actions workflow to automatically sync your repository to Hugging Face Spaces:
- Create a Hugging Face account and generate an access token
- Add the following secrets to your GitHub repository:
HF_TOKEN
: Your Hugging Face access tokenHF_USERNAME
: Your Hugging Face usernameOPENAI_API_KEY
: Your OpenAI API key (if using OpenAI models)
- Push to the main branch to trigger the workflow
Project Structure
video-anomaly-detector/
βββ app.py # Streamlit web application
βββ detector.py # Core video processing and anomaly detection with OpenAI models
βββ phi4_detector.py # Phi-4 model implementation using Hugging Face
βββ example.py # Example script for processing a single video
βββ batch_process.py # Script for batch processing multiple videos
βββ requirements.txt # Python dependencies
βββ requirements-hf.txt # Dependencies for Hugging Face Spaces
βββ .env.example # Template for environment variables
βββ .github/ # GitHub Actions workflows
βββ workflows/
βββ sync-to-hub.yml # Workflow to sync to Hugging Face
Limitations
- Processing time depends on the video length, frame skip rate, and your internet connection
- The OpenAI models require an API key and may incur usage costs
- Phi-4 model requires downloading approximately 5GB of model files on first use
- Higher frame skip values will process fewer frames, making analysis faster but potentially less accurate
- Cumulative analysis may miss some details that would be caught in granular analysis
- Live stream processing may be affected by network latency and camera quality
License
This project is licensed under the MIT License - see the LICENSE file for details.