doc updt
Browse files
README.md
CHANGED
@@ -25,8 +25,8 @@ Its weights are then downloaded from this repository.
|
|
25 |
from spk_embeddings import EmbeddingsModel, compute_embedding
|
26 |
import torch
|
27 |
|
28 |
-
|
29 |
-
|
30 |
```
|
31 |
|
32 |
The model produces normalized vectors as embeddings.
|
@@ -42,8 +42,8 @@ finally, we can compute two embeddings from two different files and compare them
|
|
42 |
wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
|
43 |
wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
|
44 |
|
45 |
-
e1 = compute_embedding(wav1,
|
46 |
-
e2 = compute_embedding(wav2,
|
47 |
sim = float(torch.matmul(e1,e2.t()))
|
48 |
|
49 |
print(sim) #
|
@@ -51,8 +51,8 @@ print(sim) #
|
|
51 |
|
52 |
# Evaluations
|
53 |
The model has been evaluated on the standard ASV [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt).
|
54 |
-
It results in an Equal Error Rate (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **
|
55 |
-
(with a decision threshold of **0.
|
56 |
|
57 |
Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
|
58 |
|
@@ -65,7 +65,7 @@ This model was used as a baseline in the context of voice characterization (pros
|
|
65 |
|
66 |
In this paper the model is denoted as W-SPK. The other two models used in this study can also be found on HuggingFace :
|
67 |
- [W-TBR](https://huggingface.co/Orange/Speaker-wavLM-tbr) for timber related embeddings
|
68 |
-
- [W-PRO](https://huggingface.co/Orange/Speaker-wavLM-
|
69 |
|
70 |
|
71 |
### Citation
|
|
|
25 |
from spk_embeddings import EmbeddingsModel, compute_embedding
|
26 |
import torch
|
27 |
|
28 |
+
model = EmbeddingsModel.from_pretrained("Orange/Speaker-wavLM-id")
|
29 |
+
model.eval()
|
30 |
```
|
31 |
|
32 |
The model produces normalized vectors as embeddings.
|
|
|
42 |
wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
|
43 |
wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
|
44 |
|
45 |
+
e1 = compute_embedding(wav1, model)
|
46 |
+
e2 = compute_embedding(wav2, model)
|
47 |
sim = float(torch.matmul(e1,e2.t()))
|
48 |
|
49 |
print(sim) #
|
|
|
51 |
|
52 |
# Evaluations
|
53 |
The model has been evaluated on the standard ASV [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt).
|
54 |
+
It results in an Equal Error Rate (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **0.98%**
|
55 |
+
(with a decision threshold of **0.37**).
|
56 |
|
57 |
Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
|
58 |
|
|
|
65 |
|
66 |
In this paper the model is denoted as W-SPK. The other two models used in this study can also be found on HuggingFace :
|
67 |
- [W-TBR](https://huggingface.co/Orange/Speaker-wavLM-tbr) for timber related embeddings
|
68 |
+
- [W-PRO](https://huggingface.co/Orange/Speaker-wavLM-pro) for non-timbral embeddings
|
69 |
|
70 |
|
71 |
### Citation
|