Update README.md
Browse files
README.md
CHANGED
@@ -64,7 +64,9 @@ nt_extractor.eval()
|
|
64 |
```
|
65 |
|
66 |
You may have noticed that the model produces normalized vectors as embeddings.
|
67 |
-
|
|
|
|
|
68 |
|
69 |
```
|
70 |
import torchaudio
|
@@ -99,6 +101,8 @@ the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/me
|
|
99 |
(with a decision threshold of **0.467**). This value can be interpreted as the ability to identify speakers only with non-timbral cues. A discussion about this interpretation can be
|
100 |
found in the paper mentioned hereabove, as well as other experiments showing correlations between these embeddings and non-timbral voice attributes.
|
101 |
|
|
|
|
|
102 |
# Limitations
|
103 |
The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
|
104 |
|
|
|
64 |
```
|
65 |
|
66 |
You may have noticed that the model produces normalized vectors as embeddings.
|
67 |
+
|
68 |
+
Next, we define a function that extracts the non-timbral embedding from an audio signal. In this tutorial version, the audio file is expected to be sampled at 16kHz.
|
69 |
+
Depending on the available memory (cpu or gpu), you may change the value of MAX_SIZE, which is used to truncate the long audio signals.
|
70 |
|
71 |
```
|
72 |
import torchaudio
|
|
|
101 |
(with a decision threshold of **0.467**). This value can be interpreted as the ability to identify speakers only with non-timbral cues. A discussion about this interpretation can be
|
102 |
found in the paper mentioned hereabove, as well as other experiments showing correlations between these embeddings and non-timbral voice attributes.
|
103 |
|
104 |
+
Please note that the EER value can vary a little depending on the MAX_SIZE defined to reduce long audios (max 30 seconds in our case).
|
105 |
+
|
106 |
# Limitations
|
107 |
The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
|
108 |
|