ggmbr commited on
Commit
2078a43
·
1 Parent(s): 1c62d17
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -25,8 +25,8 @@ Its weights are then downloaded from this repository.
25
  from spk_embeddings import EmbeddingsModel, compute_embedding
26
  import torch
27
 
28
- nt_extractor = EmbeddingsModel.from_pretrained("Orange/Speaker-wavLM-id")
29
- nt_extractor.eval()
30
  ```
31
 
32
  The model produces normalized vectors as embeddings.
@@ -42,8 +42,8 @@ finally, we can compute two embeddings from two different files and compare them
42
  wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
43
  wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
44
 
45
- e1 = compute_embedding(wav1, nt_extractor)
46
- e2 = compute_embedding(wav2, nt_extractor)
47
  sim = float(torch.matmul(e1,e2.t()))
48
 
49
  print(sim) #
@@ -51,8 +51,8 @@ print(sim) #
51
 
52
  # Evaluations
53
  The model has been evaluated on the standard ASV [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt).
54
- It results in an Equal Error Rate (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **10.681%**
55
- (with a decision threshold of **0.467**).
56
 
57
  Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
58
 
@@ -65,7 +65,7 @@ This model was used as a baseline in the context of voice characterization (pros
65
 
66
  In this paper the model is denoted as W-SPK. The other two models used in this study can also be found on HuggingFace :
67
  - [W-TBR](https://huggingface.co/Orange/Speaker-wavLM-tbr) for timber related embeddings
68
- - [W-PRO](https://huggingface.co/Orange/Speaker-wavLM-id) for non-timbral embeddings
69
 
70
 
71
  ### Citation
 
25
  from spk_embeddings import EmbeddingsModel, compute_embedding
26
  import torch
27
 
28
+ model = EmbeddingsModel.from_pretrained("Orange/Speaker-wavLM-id")
29
+ model.eval()
30
  ```
31
 
32
  The model produces normalized vectors as embeddings.
 
42
  wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
43
  wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
44
 
45
+ e1 = compute_embedding(wav1, model)
46
+ e2 = compute_embedding(wav2, model)
47
  sim = float(torch.matmul(e1,e2.t()))
48
 
49
  print(sim) #
 
51
 
52
  # Evaluations
53
  The model has been evaluated on the standard ASV [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt).
54
+ It results in an Equal Error Rate (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **0.98%**
55
+ (with a decision threshold of **0.37**).
56
 
57
  Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
58
 
 
65
 
66
  In this paper the model is denoted as W-SPK. The other two models used in this study can also be found on HuggingFace :
67
  - [W-TBR](https://huggingface.co/Orange/Speaker-wavLM-tbr) for timber related embeddings
68
+ - [W-PRO](https://huggingface.co/Orange/Speaker-wavLM-pro) for non-timbral embeddings
69
 
70
 
71
  ### Citation