Spaces:

kedar-bhumkar
/

audio_emotion_detector

Sleeping

App Files Files Community

audio_emotion_detector / README.md

kedar-bhumkar

Upload 12 files

ae55e39 verified 8 months ago

preview code

raw

history blame contribute delete

3.33 kB

A newer version of the Streamlit SDK is available: 1.51.0

Upgrade

metadata

title: Audio Emotion Analyzer
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit

Audio Emotion Analyzer

A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.

Model

This application uses the superb/wav2vec2-base-superb-er model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.

Demo App

Features

Upload your own .wav audio files for emotion analysis
Select from existing .wav files in your current directory
Real-time emotion prediction
Visual feedback with emojis

Quick Use

You can use this application in two ways:

Option 1: Run on Hugging Face Spaces

Click the "Spaces" tab on the model page to access the hosted version of this app.

Option 2: Run Locally

Clone this repository
Install the required dependencies:
```
pip install -r requirements.txt
```
Download the pre-trained model:
```
python download_model.py
```
Run the Streamlit app:
```
streamlit run app.py
```

Using Audio Files

The application automatically scans for .wav files in:

The current directory where the app is running
Immediate subdirectories (one level deep)

You can:

Place .wav files in the same directory as the app
Place .wav files in subdirectories
Upload new .wav files directly through the interface

Supported Emotions

The model can detect 7 different emotions:

Neutral 😐
Happy 😊
Sad 😢
Angry 😠
Fearful 😨
Disgusted 🤢
Surprised 😲

Technical Details

This application uses:

superb/wav2vec2-base-superb-er pre-trained model
Wav2Vec2ForSequenceClassification for emotion classification
Wav2Vec2FeatureExtractor for audio feature extraction
Streamlit for the web interface

Limitations

The model works best with clear speech audio in English
Background noise may affect the accuracy of emotion detection
Short audio clips (1-5 seconds) tend to work better than longer recordings

Troubleshooting

If you encounter issues with model loading, try:

Running python download_model.py again to download the model files
Ensuring you have a stable internet connection for the initial model download
Checking that your audio files are in .wav format with a 16kHz sample rate
Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory

Citation

If you use this application or the underlying model in your work, please cite:

@misc{superb2021,
  author = {SUPERB Team},
  title = {SUPERB: Speech processing Universal PERformance Benchmark},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/s3prl/s3prl}},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.