|
--- |
|
title: README |
|
emoji: π |
|
colorFrom: yellow |
|
colorTo: green |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
[**pyannote.audio**](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization. |
|
|
|
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used [in production by dozens of companies](https://herve.niderb.fr/consulting.html). |
|
|
|
| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) | |
|
| ---------------------- | ------ | ------ | --------- | |
|
| AISHELL-4 | 14.1 | 12.2 | 11.9 | |
|
| AliMeeting (channel 1) | 27.4 | 24.4 | 22.5 | |
|
| AMI (IHM) | 18.9 | 18.8 | 16.6 | |
|
| AMI (SDM) | 27.1 | 22.4 | 20.9 | |
|
| AVA-AVD | 66.3 | 50.0 | 39.8 | |
|
| CALLHOME (part 2) | 31.6 | 28.4 | 22.2 | |
|
| DIHARD 3 (full) | 26.9 | 21.7 | 17.2 | |
|
| Ego4D (dev.) | 61.5 | 51.2 | 43.8 | |
|
| MSDWild | 32.8 | 25.3 | 19.8 | |
|
| REPERE (phase2) | 8.2 | 7.8 | 7.6 | |
|
| VoxConverse (v0.3) | 11.2 | 11.3 | 9.4 | |
|
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) |
|
|
|
Using one Nvidia Tesla V100 SXM2 GPU and one Intel Cascade Lake 6248 CPU, |
|
* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio |
|
* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio |
|
* [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) takes less than 1m00s to process 1h of audio |
|
|