README / README.md
hbredin's picture
doc: add links to benchmarks
b7ae708
|
raw
history blame
2.37 kB
metadata
title: README
emoji: πŸš€
colorFrom: yellow
colorTo: green
sdk: static
pinned: false

pyannote.audio is an open-source toolkit for speaker diarization.

Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used in production by dozens of companies.

Benchmark v2.1 v3.1 Premium
AISHELL-4 14.1 12.2 11.9
AliMeeting (channel 1) 27.4 24.4 22.5
AMI (IHM) 18.9 18.8 16.6
AMI (SDM) 27.1 22.4 20.9
AVA-AVD 66.3 50.0 39.8
CALLHOME (part 2) 31.6 28.4 22.2
DIHARD 3 (full) 26.9 21.7 17.2
Ego4D (dev.) 61.5 51.2 43.8
MSDWild 32.8 25.3 19.8
RAMC 22.5 22.2 18.4
REPERE (phase2) 8.2 7.8 7.6
VoxConverse (v0.3) 11.2 11.3 9.4
Diarization error rate (in %)

Using one Nvidia Tesla V100 SXM2 GPU and one Intel Cascade Lake 6248 CPU,

  • v2.1 takes around 1m30s to process 1h of audio
  • v3.1 takes around 1m20s to process 1h of audio
  • Premium takes less than 45s to process 1h of audio