Spaces:

pyannote
/

README

Running

App Files Files Community

README / README.md

hbredin

Update README.md

66d96b7 verified 12 months ago

preview code

raw

history blame

2.39 kB

	---
	title: README
	emoji: 🚀
	colorFrom: yellow
	colorTo: green
	sdk: static
	pinned: false
	---

	[pyannote.audio](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization.

	Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used [in production by dozens of companies](https://www.pyannote.ai).

	\| Benchmark \| [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) \| [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) \| [pyannoteAI](https://www.pyannote.ai) \|
	\| ---------------------- \| ------ \| ------ \| --------- \|
	\| [AISHELL-4](https://arxiv.org/abs/2104.03603) \| 14.1 \| 12.2 \| 11.2 \|
	\| [AliMeeting](https://www.openslr.org/119/) (channel 1) \| 27.4 \| 24.4 \| 19.3 \|
	\| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) \| 18.9 \| 18.8 \| 15.8 \|
	\| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) \| 27.1 \| 22.4 \| 19.3 \|
	\| [AVA-AVD](https://arxiv.org/abs/2111.14448) \| 66.3 \| 50.0 \| 44.8 \|
	\| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) \| 31.6 \| 28.4 \| 19.8 \|
	\| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) \| 26.9 \| 21.7 \| 16.8 \|
	\| [Earnings21](https://github.com/revdotcom/speech-datasets) \| 17.0 \| 9.4 \| 9.1 \|
	\| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) \| 61.5 \| 51.2 \| 44.0 \|
	\| [MSDWild](https://github.com/X-LANCE/MSDWILD) \| 32.8 \| 25.3 \| 19.8 \|
	\| [RAMC](https://www.openslr.org/123/) \| 22.5 \| 22.2 \| 11.1 \|
	\| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) \| 8.2 \| 7.8 \| 7.6 \|
	\| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) \| 11.2 \| 11.3 \| 9.8 \|
	[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)

	Using high-end NVIDIA hardware,
	* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
	* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
	* On-premise [pyannoteAI](https://www.pyannote.ai) takes less than 30s to process 1h of audio

	---
	title: README
	emoji: 🚀
	colorFrom: yellow
	colorTo: green
	sdk: static
	pinned: false
	---

	[pyannote.audio](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization.

	Pretrained pipelines reach state-of-the-art performance on most academic benchmarks and are used [in production by dozens of companies](https://www.pyannote.ai).

	\| Benchmark \| [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) \| [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) \| [pyannoteAI](https://www.pyannote.ai) \|
	\| ---------------------- \| ------ \| ------ \| --------- \|
	\| [AISHELL-4](https://arxiv.org/abs/2104.03603) \| 14.1 \| 12.2 \| 11.2 \|
	\| [AliMeeting](https://www.openslr.org/119/) (channel 1) \| 27.4 \| 24.4 \| 19.3 \|
	\| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) \| 18.9 \| 18.8 \| 15.8 \|
	\| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) \| 27.1 \| 22.4 \| 19.3 \|
	\| [AVA-AVD](https://arxiv.org/abs/2111.14448) \| 66.3 \| 50.0 \| 44.8 \|
	\| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) \| 31.6 \| 28.4 \| 19.8 \|
	\| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) \| 26.9 \| 21.7 \| 16.8 \|
	\| [Earnings21](https://github.com/revdotcom/speech-datasets) \| 17.0 \| 9.4 \| 9.1 \|
	\| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) \| 61.5 \| 51.2 \| 44.0 \|
	\| [MSDWild](https://github.com/X-LANCE/MSDWILD) \| 32.8 \| 25.3 \| 19.8 \|
	\| [RAMC](https://www.openslr.org/123/) \| 22.5 \| 22.2 \| 11.1 \|
	\| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) \| 8.2 \| 7.8 \| 7.6 \|
	\| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) \| 11.2 \| 11.3 \| 9.8 \|
	[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)

	Using high-end NVIDIA hardware,
	* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
	* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
	* On-premise [pyannoteAI](https://www.pyannote.ai) takes less than 30s to process 1h of audio