kedar-bhumkar's picture
Upload 12 files
ae55e39 verified

A newer version of the Streamlit SDK is available: 1.43.2

Upgrade
metadata
title: Audio Emotion Analyzer
emoji: 🎡
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit

Audio Emotion Analyzer

A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.

Model

This application uses the superb/wav2vec2-base-superb-er model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.

Demo App

Streamlit App

Features

  • Upload your own .wav audio files for emotion analysis
  • Select from existing .wav files in your current directory
  • Real-time emotion prediction
  • Visual feedback with emojis

Quick Use

You can use this application in two ways:

Option 1: Run on Hugging Face Spaces

Click the "Spaces" tab on the model page to access the hosted version of this app.

Option 2: Run Locally

  1. Clone this repository
  2. Install the required dependencies:
    pip install -r requirements.txt
    
  3. Download the pre-trained model:
    python download_model.py
    
  4. Run the Streamlit app:
    streamlit run app.py
    

Using Audio Files

The application automatically scans for .wav files in:

  • The current directory where the app is running
  • Immediate subdirectories (one level deep)

You can:

  1. Place .wav files in the same directory as the app
  2. Place .wav files in subdirectories
  3. Upload new .wav files directly through the interface

Supported Emotions

The model can detect 7 different emotions:

  • Neutral 😐
  • Happy 😊
  • Sad 😒
  • Angry 😠
  • Fearful 😨
  • Disgusted 🀒
  • Surprised 😲

Technical Details

This application uses:

  • superb/wav2vec2-base-superb-er pre-trained model
  • Wav2Vec2ForSequenceClassification for emotion classification
  • Wav2Vec2FeatureExtractor for audio feature extraction
  • Streamlit for the web interface

Limitations

  • The model works best with clear speech audio in English
  • Background noise may affect the accuracy of emotion detection
  • Short audio clips (1-5 seconds) tend to work better than longer recordings

Troubleshooting

If you encounter issues with model loading, try:

  1. Running python download_model.py again to download the model files
  2. Ensuring you have a stable internet connection for the initial model download
  3. Checking that your audio files are in .wav format with a 16kHz sample rate
  4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory

Citation

If you use this application or the underlying model in your work, please cite:

@misc{superb2021,
  author = {SUPERB Team},
  title = {SUPERB: Speech processing Universal PERformance Benchmark},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/s3prl/s3prl}},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.