README / README.md
hbredin's picture
feat: add CALLHOME benchmark
fa48bcf
|
raw
history blame
1.74 kB
---
title: README
emoji: πŸš€
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
---
[**pyannote.audio**](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization.
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used [in production by dozens of companies](https://herve.niderb.fr/consulting.html).
| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) |
| ---------------------- | ------ | ------ | --------- |
| AISHELL-4 | 14.1 | 12.2 | 11.9 |
| AliMeeting (channel 1) | 27.4 | 24.4 | 22.5 |
| AMI (IHM) | 18.9 | 18.8 | 16.6 |
| AMI (SDM) | 27.1 | 22.4 | 20.9 |
| AVA-AVD | 66.3 | 50.0 | 39.8 |
| CALLHOME (part 2) | 31.6 | 28.4 | 22.2 |
| DIHARD 3 (full) | 26.9 | 21.7 | 17.2 |
| Ego4D (dev.) | 61.5 | 51.2 | 43.8 |
| MSDWild | 32.8 | 25.3 | 19.8 |
| REPERE (phase2) | 8.2 | 7.8 | 7.6 |
| VoxConverse (v0.3) | 11.2 | 11.3 | 9.4 |
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
Using one Nvidia Tesla V100 SXM2 GPU and one Intel Cascade Lake 6248 CPU,
* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
* [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) takes less than 1m00s to process 1h of audio