--- title: Audio Emotion Analyzer emoji: 🎵 colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.31.0 app_file: app.py pinned: false license: mit --- # Audio Emotion Analyzer A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model. ## Model This application uses the [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition. ## Demo App [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/) ## Features - Upload your own .wav audio files for emotion analysis - Select from existing .wav files in your current directory - Real-time emotion prediction - Visual feedback with emojis ## Quick Use You can use this application in two ways: ### Option 1: Run on Hugging Face Spaces Click the "Spaces" tab on the model page to access the hosted version of this app. ### Option 2: Run Locally 1. Clone this repository 2. Install the required dependencies: ```bash pip install -r requirements.txt ``` 3. Download the pre-trained model: ```bash python download_model.py ``` 4. Run the Streamlit app: ```bash streamlit run app.py ``` ## Using Audio Files The application automatically scans for .wav files in: - The current directory where the app is running - Immediate subdirectories (one level deep) You can: 1. Place .wav files in the same directory as the app 2. Place .wav files in subdirectories 3. Upload new .wav files directly through the interface ## Supported Emotions The model can detect 7 different emotions: - Neutral 😐 - Happy 😊 - Sad 😢 - Angry 😠 - Fearful 😨 - Disgusted 🤢 - Surprised 😲 ## Technical Details This application uses: - [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) pre-trained model - Wav2Vec2ForSequenceClassification for emotion classification - Wav2Vec2FeatureExtractor for audio feature extraction - Streamlit for the web interface ## Limitations - The model works best with clear speech audio in English - Background noise may affect the accuracy of emotion detection - Short audio clips (1-5 seconds) tend to work better than longer recordings ## Troubleshooting If you encounter issues with model loading, try: 1. Running `python download_model.py` again to download the model files 2. Ensuring you have a stable internet connection for the initial model download 3. Checking that your audio files are in .wav format with a 16kHz sample rate 4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory ## Citation If you use this application or the underlying model in your work, please cite: ```bibtex @misc{superb2021, author = {SUPERB Team}, title = {SUPERB: Speech processing Universal PERformance Benchmark}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/s3prl/s3prl}}, } ``` ## License This project is licensed under the MIT License - see the LICENSE file for details.