README / README.md
hbredin's picture
feat: add result on Ego4D (dev)
dce64e9
|
raw
history blame
1.75 kB
---
title: README
emoji: πŸš€
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
---
[**pyannote.audio**](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization.
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used [in production by dozens of companies](https://herve.niderb.fr/consulting.html).
| Benchmark | v1.1 | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) |
| ---------------------- | ---- | ------ | ------ | --------- |
| AISHELL-4 | - | 14.1 | 12.2 | 11.9 |
| AliMeeting (channel 1) | - | 27.4 | 24.4 | 22.5 |
| AMI (IHM) | 29.7 | 18.9 | 18.8 | 16.6 |
| AMI (SDM) | - | 27.1 | 22.4 | 20.9 |
| AVA-AVD | - | - | 50.0 | 39.8 |
| DIHARD 3 (full) | 29.2 | 26.9 | 21.7 | 17.2 |
| Ego4D (dev.) | - | 61.5 | 51.2 | 43.8
| MSDWild | - | - | 25.3 | 19.8 |
| REPERE (phase2) | - | 8.2 | 7.8 | 7.6 |
| VoxConverse (v0.3) | 21.5 | 11.2 | 11.3 | 9.4 |
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
Using one Nvidia Tesla V100 SXM2 GPU and one Intel Cascade Lake 6248 CPU,
* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
* [Premium](https://forms.gle/eKhn7H2zTa68sMMx8) takes less than 1m00s to process 1h of audio