Spaces:

kedar-bhumkar
/

audio_emotion_detector

Sleeping

App Files Files Community

audio_emotion_detector / README.md

kedar-bhumkar

Upload 12 files

ae55e39 verified 4 months ago

preview code

raw

history blame contribute delete

3.33 kB

	---
	title: Audio Emotion Analyzer
	emoji: 🎵
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.31.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# Audio Emotion Analyzer

	A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.

	## Model

	This application uses the [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.

	## Demo App

	[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/)

	## Features

	- Upload your own .wav audio files for emotion analysis
	- Select from existing .wav files in your current directory
	- Real-time emotion prediction
	- Visual feedback with emojis

	## Quick Use

	You can use this application in two ways:

	### Option 1: Run on Hugging Face Spaces
	Click the "Spaces" tab on the model page to access the hosted version of this app.

	### Option 2: Run Locally

	1. Clone this repository
	2. Install the required dependencies:
	```bash
	pip install -r requirements.txt
	```
	3. Download the pre-trained model:
	```bash
	python download_model.py
	```
	4. Run the Streamlit app:
	```bash
	streamlit run app.py
	```

	## Using Audio Files

	The application automatically scans for .wav files in:
	- The current directory where the app is running
	- Immediate subdirectories (one level deep)

	You can:
	1. Place .wav files in the same directory as the app
	2. Place .wav files in subdirectories
	3. Upload new .wav files directly through the interface

	## Supported Emotions

	The model can detect 7 different emotions:
	- Neutral 😐
	- Happy 😊
	- Sad 😢
	- Angry 😠
	- Fearful 😨
	- Disgusted 🤢
	- Surprised 😲

	## Technical Details

	This application uses:
	- [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) pre-trained model
	- Wav2Vec2ForSequenceClassification for emotion classification
	- Wav2Vec2FeatureExtractor for audio feature extraction
	- Streamlit for the web interface

	## Limitations

	- The model works best with clear speech audio in English
	- Background noise may affect the accuracy of emotion detection
	- Short audio clips (1-5 seconds) tend to work better than longer recordings

	## Troubleshooting

	If you encounter issues with model loading, try:
	1. Running `python download_model.py` again to download the model files
	2. Ensuring you have a stable internet connection for the initial model download
	3. Checking that your audio files are in .wav format with a 16kHz sample rate
	4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory

	## Citation

	If you use this application or the underlying model in your work, please cite:

	```bibtex
	@misc{superb2021,
	author = {SUPERB Team},
	title = {SUPERB: Speech processing Universal PERformance Benchmark},
	year = {2021},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/s3prl/s3prl}},
	}
	```

	## License

	This project is licensed under the MIT License - see the LICENSE file for details.