Spaces:

kedar-bhumkar
/

video_anamoly_detector

Sleeping

File size: 8,733 Bytes

1cc174f

---

title: Video Anomaly Detector
emoji: 🎥
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---


# Video Anomaly Detector

This application analyzes video files frame by frame using advanced AI models to detect anomalies based on a user-provided prompt.

## Model Description

The application supports multiple AI models for analysis:

- **GPT-4o**: OpenAI's most powerful multimodal model, offering the highest accuracy for anomaly detection
- **GPT-4o-mini**: A smaller, faster, and more cost-effective version of GPT-4o
- **Phi-4**: Microsoft's multimodal model that can run locally using Hugging Face transformers
- **Phi-3**: *(Coming soon)* Microsoft's earlier multimodal model

Each model can analyze both text and images, examining video frames to identify potential anomalies based on the user's prompt.

## Demo App

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/username/video-anomaly-detector)

## Features

- Support for both video files and live streams (webcam, IP camera, RTSP)
- Select from multiple AI models (GPT-4o, GPT-4o-mini, Phi-4)
- Skip frames for faster processing
- Provide custom prompts for anomaly detection
- Two analysis modes: frame-by-frame or cumulative summary
- Batch processing for multiple videos
- Streamlit web interface with modern UI design

## How It Works

1. The application extracts frames from the uploaded video or live stream
2. It skips a user-defined number of frames to reduce processing time
3. Based on the selected analysis depth:
   - **Granular mode**: Each selected frame is analyzed individually
   - **Cumulative mode**: All frames are analyzed together to provide an overall summary
4. The selected AI model analyzes the frame(s) and provides descriptions of any detected anomalies
5. Results are displayed in an interactive interface with timestamps for live streams

## Requirements

- Python 3.8+
- OpenAI API key with access to GPT-4o and GPT-4o-mini models (only needed for OpenAI models)
- For Phi-4: GPU recommended but not required (will use CPU if GPU not available)
- For live streams: Webcam or access to an IP camera/RTSP stream

## Installation

```bash

git clone https://github.com/username/video-anomaly-detector.git

cd video-anomaly-detector

pip install -r requirements.txt

```

## Environment Variables

Create a `.env` file in the root directory with your OpenAI API key (only needed for OpenAI models):

```

OPENAI_API_KEY=your_openai_api_key_here

```

## Usage

### Web Application

Run the Streamlit application:

```bash

streamlit run app.py

```

Your browser will automatically open with the application running at http://localhost:8501

#### Using with Video Files

1. Select "Video File" as the input source
2. Upload a video file
3. Configure the analysis settings
4. Click "Analyze Video"

#### Using with Live Streams

1. Select "Live Stream" as the input source
2. Choose between "Webcam" or "IP Camera / RTSP Stream"
3. For IP cameras, enter the stream URL (e.g., rtsp://username:password@ip_address:port/path)

4. Set the maximum number of frames to process

5. Configure the analysis settings

6. Click "Analyze Video"



### Command Line



#### Single Video Processing



```bash

python example.py --video path/to/video.mp4 --skip 5 --analysis_depth granular --model gpt-4o --prompt "Detect any unusual activities or objects in this frame"
```



Arguments:

- `--video`: Path to the video file (required)

- `--skip`: Number of frames to skip (default: 5)

- `--analysis_depth`: Analysis depth: 'granular' or 'cumulative' (default: 'granular')

- `--model`: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')

- `--prompt`: Prompt for anomaly detection

- `--api_key`: OpenAI API key (optional if set in .env file, not needed for Phi-4)



#### Live Stream Processing



```bash

python example.py --stream 0 --skip 5 --analysis_depth granular --model gpt-4o --max_frames 30 --prompt "Detect any unusual activities or objects in this frame"

```

Arguments:
- `--stream`: Stream source (0 for webcam, URL for IP camera/RTSP stream)
- `--max_frames`: Maximum number of frames to process (default: 30)
- Other arguments are the same as for video processing

#### Batch Processing

```bash

python batch_process.py --videos_dir path/to/videos --output_dir output --skip 5 --analysis_depth cumulative --model gpt-4o-mini

```

Arguments:
- `--videos_dir`: Directory containing video files (required)
- `--output_dir`: Directory to save results (default: 'output')
- `--skip`: Number of frames to skip (default: 5)
- `--analysis_depth`: Analysis depth: 'granular' or 'cumulative' (default: 'granular')
- `--model`: AI model to use: 'gpt-4o', 'gpt-4o-mini', or 'phi-4' (default: 'gpt-4o')
- `--prompt`: Prompt for anomaly detection
- `--api_key`: OpenAI API key (optional if set in .env file, not needed for Phi-4)
- `--extensions`: Comma-separated list of video file extensions (default: '.mp4,.avi,.mov,.mkv')

## Model Options

### GPT-4o
- OpenAI's most powerful multimodal model
- Highest accuracy for anomaly detection
- Requires OpenAI API key
- Recommended for critical applications where accuracy is paramount

### GPT-4o-mini
- Smaller, faster version of GPT-4o
- More cost-effective for processing large videos
- Requires OpenAI API key
- Good balance between performance and cost

### Phi-4
- Microsoft's multimodal model
- Runs locally using Hugging Face transformers
- No API key required
- First run will download the model (approximately 5GB)
- GPU recommended but not required

### Phi-3 (Coming Soon)
- Microsoft's earlier multimodal model
- Will provide an alternative option for analysis

## Analysis Depth Options

### Granular - Frame by Frame
- Analyzes each frame individually
- Provides detailed analysis for every processed frame
- Useful for detecting specific moments or events

### Cumulative - All Frames
- Analyzes all frames together to provide an overall summary
- Identifies up to 3 key frames that best represent detected anomalies
- Useful for getting a high-level understanding of anomalies in the video

## Deploying to Hugging Face Spaces

This project is configured for easy deployment to Hugging Face Spaces:

1. Fork this repository to your GitHub account
2. Create a new Space on Hugging Face: https://huggingface.co/spaces/create
3. Select "Streamlit" as the SDK
4. Link your GitHub repository
5. Add your OpenAI API key as a secret in the Space settings (if using OpenAI models)
6. The Space will automatically deploy with the configuration from this repository

Alternatively, you can use the GitHub Actions workflow to automatically sync your repository to Hugging Face Spaces:

1. Create a Hugging Face account and generate an access token
2. Add the following secrets to your GitHub repository:
   - `HF_TOKEN`: Your Hugging Face access token
   - `HF_USERNAME`: Your Hugging Face username
   - `OPENAI_API_KEY`: Your OpenAI API key (if using OpenAI models)
3. Push to the main branch to trigger the workflow

## Project Structure

```

video-anomaly-detector/

├── app.py                # Streamlit web application

├── detector.py           # Core video processing and anomaly detection with OpenAI models

├── phi4_detector.py      # Phi-4 model implementation using Hugging Face

├── example.py            # Example script for processing a single video

├── batch_process.py      # Script for batch processing multiple videos

├── requirements.txt      # Python dependencies

├── requirements-hf.txt   # Dependencies for Hugging Face Spaces

├── .env.example          # Template for environment variables

└── .github/              # GitHub Actions workflows

    └── workflows/

        └── sync-to-hub.yml  # Workflow to sync to Hugging Face

```

## Limitations

- Processing time depends on the video length, frame skip rate, and your internet connection
- The OpenAI models require an API key and may incur usage costs
- Phi-4 model requires downloading approximately 5GB of model files on first use
- Higher frame skip values will process fewer frames, making analysis faster but potentially less accurate
- Cumulative analysis may miss some details that would be caught in granular analysis
- Live stream processing may be affected by network latency and camera quality

## License

This project is licensed under the MIT License - see the LICENSE file for details.