|
---
|
|
title: Audio Emotion Analyzer
|
|
emoji: π΅
|
|
colorFrom: blue
|
|
colorTo: purple
|
|
sdk: streamlit
|
|
sdk_version: 1.31.0
|
|
app_file: app.py
|
|
pinned: false
|
|
license: mit
|
|
---
|
|
|
|
# Audio Emotion Analyzer
|
|
|
|
A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.
|
|
|
|
## Model
|
|
|
|
This application uses the [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.
|
|
|
|
## Demo App
|
|
|
|
[](https://share.streamlit.io/)
|
|
|
|
## Features
|
|
|
|
- Upload your own .wav audio files for emotion analysis
|
|
- Select from existing .wav files in your current directory
|
|
- Real-time emotion prediction
|
|
- Visual feedback with emojis
|
|
|
|
## Quick Use
|
|
|
|
You can use this application in two ways:
|
|
|
|
### Option 1: Run on Hugging Face Spaces
|
|
Click the "Spaces" tab on the model page to access the hosted version of this app.
|
|
|
|
### Option 2: Run Locally
|
|
|
|
1. Clone this repository
|
|
2. Install the required dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
3. Download the pre-trained model:
|
|
```bash
|
|
python download_model.py
|
|
```
|
|
4. Run the Streamlit app:
|
|
```bash
|
|
streamlit run app.py
|
|
```
|
|
|
|
## Using Audio Files
|
|
|
|
The application automatically scans for .wav files in:
|
|
- The current directory where the app is running
|
|
- Immediate subdirectories (one level deep)
|
|
|
|
You can:
|
|
1. Place .wav files in the same directory as the app
|
|
2. Place .wav files in subdirectories
|
|
3. Upload new .wav files directly through the interface
|
|
|
|
## Supported Emotions
|
|
|
|
The model can detect 7 different emotions:
|
|
- Neutral π
|
|
- Happy π
|
|
- Sad π’
|
|
- Angry π
|
|
- Fearful π¨
|
|
- Disgusted π€’
|
|
- Surprised π²
|
|
|
|
## Technical Details
|
|
|
|
This application uses:
|
|
- [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) pre-trained model
|
|
- Wav2Vec2ForSequenceClassification for emotion classification
|
|
- Wav2Vec2FeatureExtractor for audio feature extraction
|
|
- Streamlit for the web interface
|
|
|
|
## Limitations
|
|
|
|
- The model works best with clear speech audio in English
|
|
- Background noise may affect the accuracy of emotion detection
|
|
- Short audio clips (1-5 seconds) tend to work better than longer recordings
|
|
|
|
## Troubleshooting
|
|
|
|
If you encounter issues with model loading, try:
|
|
1. Running `python download_model.py` again to download the model files
|
|
2. Ensuring you have a stable internet connection for the initial model download
|
|
3. Checking that your audio files are in .wav format with a 16kHz sample rate
|
|
4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory
|
|
|
|
## Citation
|
|
|
|
If you use this application or the underlying model in your work, please cite:
|
|
|
|
```bibtex
|
|
@misc{superb2021,
|
|
author = {SUPERB Team},
|
|
title = {SUPERB: Speech processing Universal PERformance Benchmark},
|
|
year = {2021},
|
|
publisher = {GitHub},
|
|
journal = {GitHub repository},
|
|
howpublished = {\url{https://github.com/s3prl/s3prl}},
|
|
}
|
|
```
|
|
|
|
## License
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |