Commit
·
3dfc212
1
Parent(s):
0969d04
Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,10 @@
|
|
1 |
-
This model is fine tuned on the
|
2 |
|
3 |
-
The initial pre-trained model is facebook/wav2vec2-base. The fine tune dataset only contains 4 common emotions of IEMOCAP (happy, angry, sad, neutral),
|
4 |
|
5 |
After **10** epoches of training, the validation accuracy is around **67%**.
|
6 |
|
7 |
-
In order to impliment this model: run the following code in a python script:
|
8 |
|
9 |
```
|
10 |
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
|
@@ -12,8 +12,7 @@ import librosa
|
|
12 |
import torch
|
13 |
|
14 |
target_sampling_rate = 16000
|
15 |
-
model_name = 'canlinzhang/
|
16 |
-
my_token = my_token
|
17 |
audio_path = your_audio_path
|
18 |
|
19 |
#build id and label dicts
|
@@ -22,7 +21,7 @@ label2id = {'neu':0, 'ang':1, 'sad':2, 'hap':3}
|
|
22 |
|
23 |
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
|
24 |
|
25 |
-
model = AutoModelForAudioClassification.from_pretrained(model_name
|
26 |
|
27 |
y_ini, sr_ini = librosa.load(audio_path, sr=target_sampling_rate)
|
28 |
|
|
|
1 |
+
This model is fine tuned on the IEMOCAP dataset. We applied volume normalization and data augmentation (noise injection, pitch shift and audio stretching). Also, this is a speaker independent model: We use Ses05F in the IEMOCAP dataset as validation speaker and Ses05M as test speaker.
|
2 |
|
3 |
+
The initial pre-trained model is facebook/wav2vec2-base. The fine tune dataset only contains 4 common emotions of IEMOCAP (happy, angry, sad, neutral), *without frustration*. The audios are either padded or trimed to 8-sec-long before fine tuning.
|
4 |
|
5 |
After **10** epoches of training, the validation accuracy is around **67%**.
|
6 |
|
7 |
+
In order to impliment this model: Please run the following code in a python script:
|
8 |
|
9 |
```
|
10 |
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
|
|
|
12 |
import torch
|
13 |
|
14 |
target_sampling_rate = 16000
|
15 |
+
model_name = 'canlinzhang/wav2vec2_speech_emotion_recognition_trained_on_IEMOCAP'
|
|
|
16 |
audio_path = your_audio_path
|
17 |
|
18 |
#build id and label dicts
|
|
|
21 |
|
22 |
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
|
23 |
|
24 |
+
model = AutoModelForAudioClassification.from_pretrained(model_name)
|
25 |
|
26 |
y_ini, sr_ini = librosa.load(audio_path, sr=target_sampling_rate)
|
27 |
|