File size: 3,909 Bytes
190f6a1
779c448
 
 
 
 
 
 
 
190f6a1
 
9c84d33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f46fc08
 
 
 
14cfa78
 
 
 
 
 
9c84d33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: English Accent Classifier
emoji: 🗣️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: false
---

# English Accent Classifier with Video Analysis

This Gradio application analyzes English accents from audio extracted from video files. You can provide a video either via a direct URL or by uploading a file from your local machine.

## How it Works

1.  **Input Video:** Provide a video URL (MP4, Loom, Dropbox, Google Drive direct links) or upload a video file.
2.  **Video Processing:** The application downloads/processes the video.
3.  **Audio Extraction:** The full audio and a short segment (15 seconds) are extracted.
4.  **Language Detection:** The short audio is transcribed, and the language is detected.
5.  **Accent Classification (if English):** A longer audio segment (adjustable duration) is analyzed for English accent.
6.  **Results:** The detected language, predicted accent, confidence scores, and an audio player for the full extracted audio are displayed.

## Features

* **English Accent Classification:** Predicts the accent in English audio.
* **Language Detection:** Ensures the audio is English before accent analysis.
* **Flexible Video Input:** Supports URLs and file uploads.
* **Adjustable Analysis Duration:** Users can set the audio analysis length.
* **Audio Playback:** Allows users to listen to the extracted audio.

## Tech Stack

* [Gradio](https://gradio.app/): Interactive web UI.
* [Hugging Face Transformers](https://huggingface.co/transformers/): Pre-trained models and pipelines.
* [Requests](https://requests.readthedocs.io/en/latest/): Downloading video files.
* [MoviePy](https://zulko.github.io/moviepy/): Video editing for audio extraction.
* [PyTorch](https://pytorch.org/): Underlying deep learning framework.
* [Soundfile](https://pysoundfile.readthedocs.io/en/latest/): Audio file handling.

## Models Used

* **Accent Classification:** `dima806/english_accents_classification`
* **Language Detection:** `alexneakameni/language_detection`
* **Automatic Speech Recognition:** `openai/whisper-tiny.en`

## Usage

You can interact with the application directly in your browser. Provide a video URL or upload a file, adjust the analysis duration, and click "Analyze Video". The results will be displayed below.

### Input Formats

* **Uploaded Video Files:** `.mp4`
* **Video URLs:**
    * Direct MP4 links (ending in `.mp4`)
    * Loom video share links (`https://www.loom.com/share/...`)
    * Dropbox direct download links (MP4 links ending in `?dl=1`)
    * Google Drive direct download links (`https://drive.google.com/uc?id=...&export=download`)

### Unsupported Formats

* Webpages embedding videos (e.g., YouTube, news articles).
* Dropbox shared folder links.

## FFmpeg Requirement

This application requires [FFmpeg](https://ffmpeg.org/) to be installed on your system for audio extraction from video files. Follow the installation instructions for your operating system on the FFmpeg website.

## Troubleshooting

* **"Invalid URL"**: Ensure the URL meets the specified format requirements.
* **Audio/Video Processing Errors**: Likely due to missing or incorrectly configured FFmpeg.
* **Transcription Errors**: Audio may be unclear or contain little speech in the initial 15 seconds.
* **Non-English Language Detection**: The model is designed for English accent classification only.

## Citation

If you use this application in your work, please consider citing the original models and the libraries used.

```bibtex
@misc{huggingface_transformers,
    author = dima806,
    title = dima806/english_accents_classification,
    year =  Oct 19, 2024,
    howpublished = https://huggingface.co/dima806/english_accents_classification

    author = alexneakameni,
    title = language_detection,
    year =  Oct 19, 2024,
    howpublished = https://huggingface.co/alexneakameni/language_detection

}