kedar-bhumkar's picture
Upload 12 files
ae55e39 verified
---
title: Audio Emotion Analyzer
emoji: 🎡
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---
# Audio Emotion Analyzer
A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.
## Model
This application uses the [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.
## Demo App
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/)
## Features
- Upload your own .wav audio files for emotion analysis
- Select from existing .wav files in your current directory
- Real-time emotion prediction
- Visual feedback with emojis
## Quick Use
You can use this application in two ways:
### Option 1: Run on Hugging Face Spaces
Click the "Spaces" tab on the model page to access the hosted version of this app.
### Option 2: Run Locally
1. Clone this repository
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Download the pre-trained model:
```bash
python download_model.py
```
4. Run the Streamlit app:
```bash
streamlit run app.py
```
## Using Audio Files
The application automatically scans for .wav files in:
- The current directory where the app is running
- Immediate subdirectories (one level deep)
You can:
1. Place .wav files in the same directory as the app
2. Place .wav files in subdirectories
3. Upload new .wav files directly through the interface
## Supported Emotions
The model can detect 7 different emotions:
- Neutral 😐
- Happy 😊
- Sad 😒
- Angry 😠
- Fearful 😨
- Disgusted 🀒
- Surprised 😲
## Technical Details
This application uses:
- [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) pre-trained model
- Wav2Vec2ForSequenceClassification for emotion classification
- Wav2Vec2FeatureExtractor for audio feature extraction
- Streamlit for the web interface
## Limitations
- The model works best with clear speech audio in English
- Background noise may affect the accuracy of emotion detection
- Short audio clips (1-5 seconds) tend to work better than longer recordings
## Troubleshooting
If you encounter issues with model loading, try:
1. Running `python download_model.py` again to download the model files
2. Ensuring you have a stable internet connection for the initial model download
3. Checking that your audio files are in .wav format with a 16kHz sample rate
4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory
## Citation
If you use this application or the underlying model in your work, please cite:
```bibtex
@misc{superb2021,
author = {SUPERB Team},
title = {SUPERB: Speech processing Universal PERformance Benchmark},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/s3prl/s3prl}},
}
```
## License
This project is licensed under the MIT License - see the LICENSE file for details.