File size: 3,328 Bytes
ae55e39 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
title: Audio Emotion Analyzer
emoji: π΅
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---
# Audio Emotion Analyzer
A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.
## Model
This application uses the [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.
## Demo App
[](https://share.streamlit.io/)
## Features
- Upload your own .wav audio files for emotion analysis
- Select from existing .wav files in your current directory
- Real-time emotion prediction
- Visual feedback with emojis
## Quick Use
You can use this application in two ways:
### Option 1: Run on Hugging Face Spaces
Click the "Spaces" tab on the model page to access the hosted version of this app.
### Option 2: Run Locally
1. Clone this repository
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Download the pre-trained model:
```bash
python download_model.py
```
4. Run the Streamlit app:
```bash
streamlit run app.py
```
## Using Audio Files
The application automatically scans for .wav files in:
- The current directory where the app is running
- Immediate subdirectories (one level deep)
You can:
1. Place .wav files in the same directory as the app
2. Place .wav files in subdirectories
3. Upload new .wav files directly through the interface
## Supported Emotions
The model can detect 7 different emotions:
- Neutral π
- Happy π
- Sad π’
- Angry π
- Fearful π¨
- Disgusted π€’
- Surprised π²
## Technical Details
This application uses:
- [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) pre-trained model
- Wav2Vec2ForSequenceClassification for emotion classification
- Wav2Vec2FeatureExtractor for audio feature extraction
- Streamlit for the web interface
## Limitations
- The model works best with clear speech audio in English
- Background noise may affect the accuracy of emotion detection
- Short audio clips (1-5 seconds) tend to work better than longer recordings
## Troubleshooting
If you encounter issues with model loading, try:
1. Running `python download_model.py` again to download the model files
2. Ensuring you have a stable internet connection for the initial model download
3. Checking that your audio files are in .wav format with a 16kHz sample rate
4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory
## Citation
If you use this application or the underlying model in your work, please cite:
```bibtex
@misc{superb2021,
author = {SUPERB Team},
title = {SUPERB: Speech processing Universal PERformance Benchmark},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/s3prl/s3prl}},
}
```
## License
This project is licensed under the MIT License - see the LICENSE file for details. |