|
--- |
|
title: README |
|
emoji: 🚀 |
|
colorFrom: yellow |
|
colorTo: green |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
[**pyannote.audio**](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization. |
|
|
|
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used [in production by dozens of companies](https://herve.niderb.fr/consulting.html). |
|
|
|
| Benchmark | v1.1 | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | <a href="mailto:herve-at-niderb-dot-fr?subject=Premium pyannote.audio pipeline&body=Looks like I got your attention! Drop me an email for more details. Hervé.">Premium</a> | |
|
| ---------------------- | ---- | ------ | ------ | --------- | |
|
| AISHELL-4 | - | 14.1 | 12.2 | 11.9 | |
|
| AliMeeting (channel 1) | - | 27.4 | 24.4 | 22.5 | |
|
| AMI (IHM) | 29.7 | 18.9 | 18.8 | 16.6 | |
|
| AMI (SDM) | - | 27.1 | 22.4 | 20.9 | |
|
| AVA-AVD | - | - | 50.0 | 39.8 | |
|
| DIHARD 3 (full) | 29.2 | 26.9 | 21.7 | 17.2 | |
|
| MSDWild | - | - | 25.3 | 19.8 | |
|
| REPERE (phase2) | - | 8.2 | 7.8 | 7.6 | |
|
| VoxConverse (v0.3) | 21.5 | 11.2 | 11.3 | 9.4 | |
|
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) |
|
|
|
Using one Nvidia Tesla V100 SXM2 GPU and one Intel Cascade Lake 6248 CPU, |
|
* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio |
|
* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio |
|
* <a href="mailto:herve-at-niderb-dot-fr?subject=Premium pyannote.audio pipeline&body=Looks like I got your attention! Drop me an email for more details. Hervé.">Premium</a> takes less than 1m00s to process 1h of audio |
|
|