Spaces:

kedar-bhumkar
/

audio_emotion_detector

Sleeping

App Files Files Community

kedar-bhumkar commited on Mar 5

Commit

ae55e39

verified ·

1 Parent(s): 6264ff0

Upload 12 files

Browse files

Files changed (13) hide show

.gitattributes +3 -0
OAF_youth_angry.wav +0 -0
OAF_youth_happy.wav +3 -0
README.md +116 -14
YAF_youth_disgust.wav +3 -0
YAF_youth_sad.wav +3 -0
app.py +110 -0
backend.py +81 -0
config.json +130 -0
download_model.py +44 -0
model.safetensors +3 -0
preprocessor_config.json +9 -0
requirements.txt +6 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+OAF_youth_happy.wav filter=lfs diff=lfs merge=lfs -text
+YAF_youth_disgust.wav filter=lfs diff=lfs merge=lfs -text
+YAF_youth_sad.wav filter=lfs diff=lfs merge=lfs -text

OAF_youth_angry.wav ADDED Viewed

Binary file (77.3 kB). View file

OAF_youth_happy.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2abac0dc8317e1ce831cad5897d69533dd380e55ee2b6cfeceadad953acabb4
+size 100888

README.md CHANGED Viewed

@@ -1,14 +1,116 @@
----
-title: Audio Emotion Detector
-emoji: 🌍
-colorFrom: green
-colorTo: pink
-sdk: streamlit
-sdk_version: 1.43.0
-app_file: app.py
-pinned: false
-license: apache-2.0
-short_description: Detect emotion within a .wav sound file
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Audio Emotion Analyzer
+emoji: 🎵
+colorFrom: blue
+colorTo: purple
+sdk: streamlit
+sdk_version: 1.31.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# Audio Emotion Analyzer
+A Streamlit application that analyzes the emotional tone in speech audio files using a pre-trained Wav2Vec2 model.
+## Model
+This application uses the [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) model from Hugging Face, which is a Wav2Vec2 model fine-tuned for speech emotion recognition.
+## Demo App
+[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/)
+## Features
+- Upload your own .wav audio files for emotion analysis
+- Select from existing .wav files in your current directory
+- Real-time emotion prediction
+- Visual feedback with emojis
+## Quick Use
+You can use this application in two ways:
+### Option 1: Run on Hugging Face Spaces
+Click the "Spaces" tab on the model page to access the hosted version of this app.
+### Option 2: Run Locally
+1. Clone this repository
+2. Install the required dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Download the pre-trained model:
+   ```bash
+   python download_model.py
+   ```
+4. Run the Streamlit app:
+   ```bash
+   streamlit run app.py
+   ```
+## Using Audio Files
+The application automatically scans for .wav files in:
+- The current directory where the app is running
+- Immediate subdirectories (one level deep)
+You can:
+1. Place .wav files in the same directory as the app
+2. Place .wav files in subdirectories
+3. Upload new .wav files directly through the interface
+## Supported Emotions
+The model can detect 7 different emotions:
+- Neutral 😐
+- Happy 😊
+- Sad 😢
+- Angry 😠
+- Fearful 😨
+- Disgusted 🤢
+- Surprised 😲
+## Technical Details
+This application uses:
+- [superb/wav2vec2-base-superb-er](https://huggingface.co/superb/wav2vec2-base-superb-er) pre-trained model
+- Wav2Vec2ForSequenceClassification for emotion classification
+- Wav2Vec2FeatureExtractor for audio feature extraction
+- Streamlit for the web interface
+## Limitations
+- The model works best with clear speech audio in English
+- Background noise may affect the accuracy of emotion detection
+- Short audio clips (1-5 seconds) tend to work better than longer recordings
+## Troubleshooting
+If you encounter issues with model loading, try:
+1. Running `python download_model.py` again to download the model files
+2. Ensuring you have a stable internet connection for the initial model download
+3. Checking that your audio files are in .wav format with a 16kHz sample rate
+4. Verifying that the model files (pytorch_model.bin, config.json, preprocessor_config.json) are in your current directory
+## Citation
+If you use this application or the underlying model in your work, please cite:
+```bibtex
+@misc{superb2021,
+  author = {SUPERB Team},
+  title = {SUPERB: Speech processing Universal PERformance Benchmark},
+  year = {2021},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/s3prl/s3prl}},
+}
+```
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

YAF_youth_disgust.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b240172f5784d6f62dd9210e1f50b08f705960659b3260307ce95d3ed776eb5
+size 130620

YAF_youth_sad.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eddb450d5e5acb32f968ae6da9224bcb10c65326e060c4d87d9cd3c4c29399be
+size 116028

app.py ADDED Viewed

	@@ -0,0 +1,110 @@

+import streamlit as st
+import os
+import tempfile
+from backend import predict_emotion
+# Set page configuration
+st.set_page_config(
+    page_title="Audio Emotion Analyzer",
+    page_icon="🎵",
+    layout="centered"
+)
+# Title and description
+st.title("🎵 Audio Emotion Analyzer")
+st.markdown("Upload a .wav file or select an existing audio file to analyze the emotion in the speech.")
+# Function to load audio files from current directory and subdirectories
+def get_audio_files():
+    audio_files = []
+    # Scan current directory and immediate subdirectories
+    for root, dirs, files in os.walk('.', topdown=True):
+        # Limit depth to current directory and immediate subdirectories
+        if root.count(os.sep) <= 1:  # Only include current dir and immediate subdirs
+            for file in files:
+                if file.lower().endswith('.wav'):
+                    rel_path = os.path.join(root, file)
+                    # Remove leading ./ or .\ from path
+                    if rel_path.startswith('./') or rel_path.startswith('.\\'):
+                        rel_path = rel_path[2:]
+                    audio_files.append(rel_path)
+    return sorted(audio_files)
+# Get audio files
+audio_files = get_audio_files()
+# Create two columns for upload and file selection
+col1, col2 = st.columns(2)
+with col1:
+    st.subheader("Upload your audio")
+    uploaded_file = st.file_uploader("Choose a .wav file", type=["wav"])
+with col2:
+    st.subheader("Or select an existing file")
+    selected_file = None
+    if audio_files:
+        selected_file = st.selectbox("Choose an audio file", ["None"] + audio_files)
+    else:
+        st.info("No .wav files found in the current directory or immediate subdirectories.")
+# Determine which file to use
+audio_file = None
+file_path = None
+if uploaded_file is not None:
+    # Create a temporary file to save the uploaded file
+    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
+        tmp_file.write(uploaded_file.getvalue())
+        file_path = tmp_file.name
+    audio_file = uploaded_file.name
+    st.audio(uploaded_file, format="audio/wav")
+elif selected_file is not None and selected_file != "None":
+    file_path = selected_file
+    audio_file = selected_file
+    st.audio(file_path, format="audio/wav")
+# Submit button
+if st.button("Analyze Emotion", disabled=(file_path is None)):
+    if file_path:
+        with st.spinner("Analyzing audio..."):
+            # Call the backend function to predict emotion
+            emotion = predict_emotion(file_path)
+            # Display the result
+            st.success(f"Analysis complete!")
+            st.markdown(f"## Predicted Emotion: **{emotion}**")
+            # Display emoji based on emotion
+            emoji_map = {
+                "Neutral": "😐",
+                "Happy": "😊",
+                "Sad": "😢",
+                "Angry": "😠",
+                "Fearful": "😨",
+                "Disgusted": "🤢",
+                "Surprised": "😲"
+            }
+            emoji = emoji_map.get(emotion, "🤔")
+            st.markdown(f"# {emoji}")
+        # Clean up temporary file if it was created
+        if uploaded_file is not None:
+            os.unlink(file_path)
+    else:
+        st.warning("Please upload a file or select an existing file first.")
+# Add some information about the app
+st.markdown("---")
+st.markdown("""
+### About this app
+This application uses a pre-trained Wav2Vec2 model to analyze the emotional tone in speech audio.
+The model can detect 7 different emotions: Neutral, Happy, Sad, Angry, Fearful, Disgusted, and Surprised.
+### How to use
+1. Upload a .wav file or select an existing audio file
+2. Click the "Analyze Emotion" button
+3. View the predicted emotion result
+""")

backend.py ADDED Viewed

	@@ -0,0 +1,81 @@

+import os
+import torch
+import librosa
+from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
+# Define emotion labels for the model
+EMOTION_LABELS = [
+    "Neutral", "Happy", "Sad", "Angry", "Fearful", "Disgusted", "Surprised"
+]
+# Model paths
+MODEL_NAME = "superb/wav2vec2-base-superb-er"
+# Look for model files directly in the current directory
+LOCAL_MODEL_DIR = "."
+LOCAL_FEATURE_EXTRACTOR_DIR = "."
+def load_model():
+    """Load the emotion recognition model and feature extractor"""
+    try:
+        # Check if model files exist in current directory
+        model_files_exist = any(f.startswith("pytorch_model") for f in os.listdir(LOCAL_MODEL_DIR))
+        config_file_exists = os.path.exists(os.path.join(LOCAL_MODEL_DIR, "config.json"))
+        feature_extractor_exists = os.path.exists(os.path.join(LOCAL_FEATURE_EXTRACTOR_DIR, "preprocessor_config.json"))
+        if model_files_exist and config_file_exists and feature_extractor_exists:
+            print("Loading model and feature extractor from current directory...")
+            feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(LOCAL_FEATURE_EXTRACTOR_DIR)
+            model = Wav2Vec2ForSequenceClassification.from_pretrained(LOCAL_MODEL_DIR)
+        else:
+            print("Local model files not found. Loading from Hugging Face...")
+            feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_NAME)
+            model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_NAME)
+        return model, feature_extractor
+    except Exception as e:
+        print(f"Error loading model: {e}")
+        # Fallback to using Auto classes if specific classes fail
+        from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
+        feature_extractor = AutoFeatureExtractor.from_pretrained(MODEL_NAME)
+        model = AutoModelForAudioClassification.from_pretrained(MODEL_NAME)
+        return model, feature_extractor
+def predict_emotion(audio_path):
+    """Predict emotion from audio file"""
+    try:
+        # Load model and feature extractor
+        model, feature_extractor = load_model()
+        # Load and preprocess audio
+        speech_array, sampling_rate = librosa.load(audio_path, sr=16000)
+        # Process the audio
+        inputs = feature_extractor(speech_array, sampling_rate=sampling_rate, return_tensors="pt")
+        # Predict emotion
+        with torch.no_grad():
+            logits = model(**inputs).logits
+        # Get emotion label
+        predicted_class_id = torch.argmax(logits, dim=-1).item()
+        # Return the predicted emotion
+        return EMOTION_LABELS[predicted_class_id]
+    except Exception as e:
+        print(f"Error predicting emotion: {e}")
+        return "Error: Could not predict emotion"
+# For testing
+if __name__ == "__main__":
+    # Test with a file from current directory
+    current_dir = "."
+    wav_files = [f for f in os.listdir(current_dir) if f.endswith(".wav")]
+    if wav_files:
+        test_file = wav_files[0]
+        print(f"Testing with file: {test_file}")
+        emotion = predict_emotion(test_file)
+        print(f"Predicted emotion: {emotion}")
+    else:
+        print("No .wav files found in current directory for testing.")

config.json ADDED Viewed

	@@ -0,0 +1,130 @@

+{
+  "_name_or_path": "superb/wav2vec2-base-superb-er",
+  "activation_dropout": 0.0,
+  "adapter_attn_dim": null,
+  "adapter_kernel_size": 3,
+  "adapter_stride": 2,
+  "add_adapter": false,
+  "apply_spec_augment": true,
+  "architectures": [
+    "Wav2Vec2ForSequenceClassification"
+  ],
+  "attention_dropout": 0.1,
+  "bos_token_id": 1,
+  "classifier_proj_size": 256,
+  "codevector_dim": 256,
+  "contrastive_logits_temperature": 0.1,
+  "conv_bias": false,
+  "conv_dim": [
+    512,
+    512,
+    512,
+    512,
+    512,
+    512,
+    512
+  ],
+  "conv_kernel": [
+    10,
+    3,
+    3,
+    3,
+    3,
+    2,
+    2
+  ],
+  "conv_stride": [
+    5,
+    2,
+    2,
+    2,
+    2,
+    2,
+    2
+  ],
+  "ctc_loss_reduction": "sum",
+  "ctc_zero_infinity": false,
+  "diversity_loss_weight": 0.1,
+  "do_stable_layer_norm": false,
+  "eos_token_id": 2,
+  "feat_extract_activation": "gelu",
+  "feat_extract_norm": "group",
+  "feat_proj_dropout": 0.1,
+  "feat_quantizer_dropout": 0.0,
+  "final_dropout": 0.0,
+  "freeze_feat_extract_train": true,
+  "hidden_act": "gelu",
+  "hidden_dropout": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "neu",
+    "1": "hap",
+    "2": "ang",
+    "3": "sad"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "ang": 2,
+    "hap": 1,
+    "neu": 0,
+    "sad": 3
+  },
+  "layer_norm_eps": 1e-05,
+  "layerdrop": 0.05,
+  "mask_channel_length": 10,
+  "mask_channel_min_space": 1,
+  "mask_channel_other": 0.0,
+  "mask_channel_prob": 0.0,
+  "mask_channel_selection": "static",
+  "mask_feature_length": 10,
+  "mask_feature_min_masks": 0,
+  "mask_feature_prob": 0.0,
+  "mask_time_length": 10,
+  "mask_time_min_masks": 2,
+  "mask_time_min_space": 1,
+  "mask_time_other": 0.0,
+  "mask_time_prob": 0.05,
+  "mask_time_selection": "static",
+  "model_type": "wav2vec2",
+  "no_mask_channel_overlap": false,
+  "no_mask_time_overlap": false,
+  "num_adapter_layers": 3,
+  "num_attention_heads": 12,
+  "num_codevector_groups": 2,
+  "num_codevectors_per_group": 320,
+  "num_conv_pos_embedding_groups": 16,
+  "num_conv_pos_embeddings": 128,
+  "num_feat_extract_layers": 7,
+  "num_hidden_layers": 12,
+  "num_negatives": 100,
+  "output_hidden_size": 768,
+  "pad_token_id": 0,
+  "proj_codevector_dim": 256,
+  "tdnn_dilation": [
+    1,
+    2,
+    3,
+    1,
+    1
+  ],
+  "tdnn_dim": [
+    512,
+    512,
+    512,
+    512,
+    1500
+  ],
+  "tdnn_kernel": [
+    5,
+    3,
+    3,
+    1,
+    1
+  ],
+  "torch_dtype": "float32",
+  "transformers_version": "4.49.0",
+  "use_weighted_layer_sum": true,
+  "vocab_size": 32,
+  "xvector_output_dim": 512
+}

download_model.py ADDED Viewed

	@@ -0,0 +1,44 @@

+import os
+import torch
+from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
+# Define the model name
+MODEL_NAME = "superb/wav2vec2-base-superb-er"
+OUTPUT_DIR = "."  # Save directly to current directory
+print(f"Downloading model: {MODEL_NAME}")
+print("This may take a few minutes depending on your internet connection...")
+try:
+    # Download and save the feature extractor
+    print("Downloading feature extractor...")
+    feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MODEL_NAME)
+    feature_extractor.save_pretrained(OUTPUT_DIR)
+    print(f"Feature extractor saved to current directory")
+    # Download and save the model
+    print("Downloading model...")
+    model = Wav2Vec2ForSequenceClassification.from_pretrained(MODEL_NAME)
+    model.save_pretrained(OUTPUT_DIR)
+    print(f"Model saved to current directory")
+    print("\nModel and feature extractor downloaded successfully!")
+    print("You can now use them in your application by loading from the current directory.")
+except Exception as e:
+    print(f"Error downloading model: {e}")
+    print("\nTrying alternative approach...")
+    # If direct download fails, try using AutoClasses first and then save with specific classes
+    from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
+    # Download with Auto classes
+    feature_extractor = AutoFeatureExtractor.from_pretrained(MODEL_NAME)
+    model = AutoModelForAudioClassification.from_pretrained(MODEL_NAME)
+    # Save with specific classes
+    feature_extractor.save_pretrained(OUTPUT_DIR)
+    model.save_pretrained(OUTPUT_DIR)
+    print("\nModel and feature extractor downloaded successfully using alternative approach!")
+    print("You can now use them in your application by loading from the current directory.")

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee79804e2abf203994cf7626ff439d44a67daf3878c9ed3ee7b139ce1f36ba1b
+size 378304548

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "do_normalize": false,
+  "feature_extractor_type": "Wav2Vec2FeatureExtractor",
+  "feature_size": 1,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "return_attention_mask": true,
+  "sampling_rate": 16000
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+streamlit==1.31.0
+torch==2.1.0
+librosa==0.10.1
+transformers==4.35.0
+numpy==1.24.3
+soundfile==0.12.1