Spaces:

kedar-bhumkar
/

audio_emotion_detector

Sleeping

File size: 3,328 Bytes

ae55e39

---

title: Audio Emotion Analyzer
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---


# Audio Emotion Analyzer

A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.

## Model

This application uses the [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.

## Demo App

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/)

## Features

- Upload your own .wav audio files for emotion analysis
- Select from existing .wav files in your current directory
- Real-time emotion prediction
- Visual feedback with emojis

## Quick Use

You can use this application in two ways:

### Option 1: Run on Hugging Face Spaces
Click the "Spaces" tab on the model page to access the hosted version of this app.

### Option 2: Run Locally

1. Clone this repository
2. Install the required dependencies:
   ```bash

   pip install -r requirements.txt

   ```
3. Download the pre-trained model:
   ```bash

   python download_model.py

   ```
4. Run the Streamlit app:
   ```bash

   streamlit run app.py

   ```

## Using Audio Files

The application automatically scans for .wav files in:
- The current directory where the app is running
- Immediate subdirectories (one level deep)

You can:
1. Place .wav files in the same directory as the app
2. Place .wav files in subdirectories
3. Upload new .wav files directly through the interface

## Supported Emotions

The model can detect 7 different emotions:
- Neutral 😐
- Happy 😊
- Sad 😢
- Angry 😠
- Fearful 😨
- Disgusted 🤢
- Surprised 😲

## Technical Details

This application uses:
- [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) pre-trained model
- Wav2Vec2ForSequenceClassification for emotion classification
- Wav2Vec2FeatureExtractor for audio feature extraction
- Streamlit for the web interface

## Limitations

- The model works best with clear speech audio in English
- Background noise may affect the accuracy of emotion detection
- Short audio clips (1-5 seconds) tend to work better than longer recordings

## Troubleshooting

If you encounter issues with model loading, try:
1. Running `python download_model.py` again to download the model files
2. Ensuring you have a stable internet connection for the initial model download
3. Checking that your audio files are in .wav format with a 16kHz sample rate
4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory

## Citation

If you use this application or the underlying model in your work, please cite:

```bibtex

@misc{superb2021,

  author = {SUPERB Team},

  title = {SUPERB: Speech processing Universal PERformance Benchmark},

  year = {2021},

  publisher = {GitHub},

  journal = {GitHub repository},

  howpublished = {\url{https://github.com/s3prl/s3prl}},

}

```

## License

This project is licensed under the MIT License - see the LICENSE file for details.