hbredin commited on
Commit
33c6fa0
·
verified ·
1 Parent(s): 2dd2cc5

feat: update with latest premium pipeline (WIP)

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -13,22 +13,22 @@ Pretrained pipelines reach state-of-the-art performance on most academic benchma
13
 
14
  | Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) |
15
  | ---------------------- | ------ | ------ | --------- |
16
- | [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 11.9 |
17
- | [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.4 | 22.5 |
18
- | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 16.6 |
19
- | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.4 | 20.9 |
20
- | [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 50.0 | 39.8 |
21
- | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 22.2 |
22
- | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.7 | 17.2 |
23
- | [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.0 |
24
- | [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 43.8 |
25
- | [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.3 | 19.8 |
26
  | [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 18.4 |
27
- | [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.8 | 7.6 |
28
- | [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.3 | 9.4 |
29
  [Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
30
 
31
  Using one Nvidia Tesla V100 SXM2 GPU and one Intel Cascade Lake 6248 CPU,
32
  * [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
33
  * [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
34
- * [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) takes less than 45s to process 1h of audio
 
13
 
14
  | Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) |
15
  | ---------------------- | ------ | ------ | --------- |
16
+ | [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 11.1 |
17
+ | [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.4 | 20.3 |
18
+ | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 15.0 |
19
+ | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.4 | 18.9 |
20
+ | [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 50.0 | 42.7 |
21
+ | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 21.7 |
22
+ | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.7 | 17.1 |
23
+ | [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.1 |
24
+ | [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 46.0 |
25
+ | [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.3 | 19.3 |
26
  | [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 18.4 |
27
+ | [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.8 | 8.3 |
28
+ | [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.3 | 9.5 |
29
  [Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
30
 
31
  Using one Nvidia Tesla V100 SXM2 GPU and one Intel Cascade Lake 6248 CPU,
32
  * [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
33
  * [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
34
+ * [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) takes less than 35s to process 1h of audio