Orange
/

Speaker-wavLM-id

🇪🇺 Region: EU

Model card Files Files and versions Community

ggmbr commited on Feb 10

Commit

2078a43

·

1 Parent(s): 1c62d17

doc updt

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -25,8 +25,8 @@ Its weights are then downloaded from this repository.
 from spk_embeddings import EmbeddingsModel, compute_embedding
 import torch
-nt_extractor = EmbeddingsModel.from_pretrained("Orange/Speaker-wavLM-id")
-nt_extractor.eval()
 ```
 The model produces normalized vectors as embeddings.
@@ -42,8 +42,8 @@ finally, we can compute two embeddings from two different files and compare them
 wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
 wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
-e1 = compute_embedding(wav1, nt_extractor)
-e2 = compute_embedding(wav2, nt_extractor)
 sim = float(torch.matmul(e1,e2.t()))
 print(sim) #
@@ -51,8 +51,8 @@ print(sim) #
 # Evaluations
 The model has been evaluated on the standard ASV [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt).
-It results in an Equal Error Rate (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **10.681%**
-(with a decision threshold of **0.467**).
 Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
@@ -65,7 +65,7 @@ This model was used as a baseline in the context of voice characterization (pros
 In this paper the model is denoted as W-SPK. The other two models used in this study can also be found on HuggingFace :
 - [W-TBR](https://huggingface.co/Orange/Speaker-wavLM-tbr) for timber related embeddings
-- [W-PRO](https://huggingface.co/Orange/Speaker-wavLM-id) for non-timbral embeddings
 ### Citation

 from spk_embeddings import EmbeddingsModel, compute_embedding
 import torch
+model = EmbeddingsModel.from_pretrained("Orange/Speaker-wavLM-id")
+model.eval()
 ```
 The model produces normalized vectors as embeddings.
 wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
 wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
+e1 = compute_embedding(wav1, model)
+e2 = compute_embedding(wav2, model)
 sim = float(torch.matmul(e1,e2.t()))
 print(sim) #
 # Evaluations
 The model has been evaluated on the standard ASV [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt).
+It results in an Equal Error Rate (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **0.98%**
+(with a decision threshold of **0.37**).
 Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
 In this paper the model is denoted as W-SPK. The other two models used in this study can also be found on HuggingFace :
 - [W-TBR](https://huggingface.co/Orange/Speaker-wavLM-tbr) for timber related embeddings
+- [W-PRO](https://huggingface.co/Orange/Speaker-wavLM-pro) for non-timbral embeddings
 ### Citation