diff --git a/segmentation/segmentation/LICENSE b/LICENSE similarity index 100% rename from segmentation/segmentation/LICENSE rename to LICENSE diff --git a/README.md b/README.md index 7be5fc7f47d5db027d120b8024982df93db95b74..5ff4ab94f198cf2882995ff4c1cb29e7db61e5c9 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,144 @@ ---- -license: mit ---- +--- +tags: +- pyannote +- pyannote-audio +- pyannote-audio-model +- audio +- voice +- speech +- speaker +- speaker-segmentation +- voice-activity-detection +- overlapped-speech-detection +- resegmentation +license: mit +inference: false +extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening." +extra_gated_fields: + Company/university: text + Website: text + I plan to use this model for (task, type of audio data, etc): text +--- + +Using this open-source model in production? +Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options. + + +# 🎹 Speaker segmentation + +[Paper](http://arxiv.org/abs/2104.04045) | [Demo](https://huggingface.co/spaces/pyannote/pretrained-pipelines) | [Blog post](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all) + +![Example](example.png) + +## Usage + +Relies on pyannote.audio 2.1.1: see [installation instructions](https://github.com/pyannote/pyannote-audio). + +```python +# 1. visit hf.co/pyannote/segmentation and accept user conditions +# 2. visit hf.co/settings/tokens to create an access token +# 3. instantiate pretrained model +from pyannote.audio import Model +model = Model.from_pretrained("pyannote/segmentation", + use_auth_token="ACCESS_TOKEN_GOES_HERE") +``` + +### Voice activity detection + +```python +from pyannote.audio.pipelines import VoiceActivityDetection +pipeline = VoiceActivityDetection(segmentation=model) +HYPER_PARAMETERS = { + # onset/offset activation thresholds + "onset": 0.5, "offset": 0.5, + # remove speech regions shorter than that many seconds. + "min_duration_on": 0.0, + # fill non-speech regions shorter than that many seconds. + "min_duration_off": 0.0 +} +pipeline.instantiate(HYPER_PARAMETERS) +vad = pipeline("audio.wav") +# `vad` is a pyannote.core.Annotation instance containing speech regions +``` + +### Overlapped speech detection + +```python +from pyannote.audio.pipelines import OverlappedSpeechDetection +pipeline = OverlappedSpeechDetection(segmentation=model) +pipeline.instantiate(HYPER_PARAMETERS) +osd = pipeline("audio.wav") +# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions +``` + +### Resegmentation + +```python +from pyannote.audio.pipelines import Resegmentation +pipeline = Resegmentation(segmentation=model, + diarization="baseline") +pipeline.instantiate(HYPER_PARAMETERS) +resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline}) +# where `baseline` should be provided as a pyannote.core.Annotation instance +``` + +### Raw scores + +```python +from pyannote.audio import Inference +inference = Inference(model) +segmentation = inference("audio.wav") +# `segmentation` is a pyannote.core.SlidingWindowFeature +# instance containing raw segmentation scores like the +# one pictured above (output) +``` + + +## Citation + +```bibtex +@inproceedings{Bredin2021, + Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}}, + Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine}, + Booktitle = {Proc. Interspeech 2021}, + Address = {Brno, Czech Republic}, + Month = {August}, + Year = {2021}, +``` + +```bibtex +@inproceedings{Bredin2020, + Title = {{pyannote.audio: neural building blocks for speaker diarization}}, + Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe}, + Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing}, + Address = {Barcelona, Spain}, + Month = {May}, + Year = {2020}, +} +``` + +## Reproducible research + +In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation +"](https://arxiv.org/abs/2104.04045), use `pyannote/segmentation@Interspeech2021` with the following hyper-parameters: + +| Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off` | +| ------------------------ | ------- | -------- | ----------------- | ------------------ | +| AMI Mix-Headset | 0.684 | 0.577 | 0.181 | 0.037 | +| DIHARD3 | 0.767 | 0.377 | 0.136 | 0.067 | +| VoxConverse | 0.767 | 0.713 | 0.182 | 0.501 | + +| Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off` | +| --------------------------- | ------- | -------- | ----------------- | ------------------ | +| AMI Mix-Headset | 0.448 | 0.362 | 0.116 | 0.187 | +| DIHARD3 | 0.430 | 0.320 | 0.091 | 0.144 | +| VoxConverse | 0.587 | 0.426 | 0.337 | 0.112 | + +| Resegmentation of VBx | `onset` | `offset` | `min_duration_on` | `min_duration_off` | +| --------------------- | ------- | -------- | ----------------- | ------------------ | +| AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705 | +| DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182 | +| VoxConverse | 0.537 | 0.724 | 0.410 | 0.563 | + +Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories. + diff --git a/segmentation/segmentation/config.yaml b/config.yaml similarity index 100% rename from segmentation/segmentation/config.yaml rename to config.yaml diff --git a/segmentation/segmentation/example.png b/example.png similarity index 100% rename from segmentation/segmentation/example.png rename to example.png diff --git a/original.zip b/original.zip new file mode 100644 index 0000000000000000000000000000000000000000..f906bfc3cdef6100047d26f3540f9431294bb507 --- /dev/null +++ b/original.zip @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b37ccf4b649e52e813551f4441e9dc9fb5f321f677f45fbba5cff77a80dd5167 +size 57867411 diff --git a/segmentation/segmentation/pytorch_model.bin b/pytorch_model.bin similarity index 100% rename from segmentation/segmentation/pytorch_model.bin rename to pytorch_model.bin diff --git a/segmentation/segmentation/reproducible_research/dihard3_custom_split/development.txt b/reproducible_research/dihard3_custom_split/development.txt similarity index 100% rename from segmentation/segmentation/reproducible_research/dihard3_custom_split/development.txt rename to reproducible_research/dihard3_custom_split/development.txt diff --git a/segmentation/segmentation/reproducible_research/dihard3_custom_split/train.txt b/reproducible_research/dihard3_custom_split/train.txt similarity index 100% rename from segmentation/segmentation/reproducible_research/dihard3_custom_split/train.txt rename to reproducible_research/dihard3_custom_split/train.txt diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/osd/AMI.development.rttm b/reproducible_research/expected_outputs/osd/AMI.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/osd/AMI.development.rttm rename to reproducible_research/expected_outputs/osd/AMI.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/osd/AMI.test.rttm b/reproducible_research/expected_outputs/osd/AMI.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/osd/AMI.test.rttm rename to reproducible_research/expected_outputs/osd/AMI.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/osd/DIHARD.development.rttm b/reproducible_research/expected_outputs/osd/DIHARD.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/osd/DIHARD.development.rttm rename to reproducible_research/expected_outputs/osd/DIHARD.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/osd/DIHARD.test.rttm b/reproducible_research/expected_outputs/osd/DIHARD.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/osd/DIHARD.test.rttm rename to reproducible_research/expected_outputs/osd/DIHARD.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm b/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm rename to reproducible_research/expected_outputs/osd/VoxConverse.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm b/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm rename to reproducible_research/expected_outputs/osd/VoxConverse.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/rsg/AMI.development.rttm b/reproducible_research/expected_outputs/rsg/AMI.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/rsg/AMI.development.rttm rename to reproducible_research/expected_outputs/rsg/AMI.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/rsg/AMI.test.rttm b/reproducible_research/expected_outputs/rsg/AMI.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/rsg/AMI.test.rttm rename to reproducible_research/expected_outputs/rsg/AMI.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm b/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm rename to reproducible_research/expected_outputs/rsg/DIHARD.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm b/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm rename to reproducible_research/expected_outputs/rsg/DIHARD.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm b/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm rename to reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vad/AMI.development.rttm b/reproducible_research/expected_outputs/vad/AMI.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vad/AMI.development.rttm rename to reproducible_research/expected_outputs/vad/AMI.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vad/AMI.test.rttm b/reproducible_research/expected_outputs/vad/AMI.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vad/AMI.test.rttm rename to reproducible_research/expected_outputs/vad/AMI.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vad/DIHARD.development.rttm b/reproducible_research/expected_outputs/vad/DIHARD.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vad/DIHARD.development.rttm rename to reproducible_research/expected_outputs/vad/DIHARD.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vad/DIHARD.test.rttm b/reproducible_research/expected_outputs/vad/DIHARD.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vad/DIHARD.test.rttm rename to reproducible_research/expected_outputs/vad/DIHARD.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm b/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm rename to reproducible_research/expected_outputs/vad/VoxConverse.development.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm b/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm rename to reproducible_research/expected_outputs/vad/VoxConverse.test.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vbx/AMI.rttm b/reproducible_research/expected_outputs/vbx/AMI.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vbx/AMI.rttm rename to reproducible_research/expected_outputs/vbx/AMI.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vbx/DIHARD.rttm b/reproducible_research/expected_outputs/vbx/DIHARD.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vbx/DIHARD.rttm rename to reproducible_research/expected_outputs/vbx/DIHARD.rttm diff --git a/segmentation/segmentation/reproducible_research/expected_outputs/vbx/VoxConverse.rttm b/reproducible_research/expected_outputs/vbx/VoxConverse.rttm similarity index 100% rename from segmentation/segmentation/reproducible_research/expected_outputs/vbx/VoxConverse.rttm rename to reproducible_research/expected_outputs/vbx/VoxConverse.rttm diff --git a/segmentation/segmentation/.cache/huggingface/.gitignore b/segmentation/segmentation/.cache/huggingface/.gitignore deleted file mode 100644 index f59ec20aabf5842d237244ece8c81ab184faeac1..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/.gitignore +++ /dev/null @@ -1 +0,0 @@ -* \ No newline at end of file diff --git a/segmentation/segmentation/.cache/huggingface/download/.gitattributes.lock b/segmentation/segmentation/.cache/huggingface/download/.gitattributes.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/.gitattributes.metadata b/segmentation/segmentation/.cache/huggingface/download/.gitattributes.metadata deleted file mode 100644 index c6be902384b0496eb28b8f096dcb8d35eef3865c..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/.gitattributes.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -12a25d60d1e877a9273c14f7336a9812664a06ab -1752243556.0054998 diff --git a/segmentation/segmentation/.cache/huggingface/download/LICENSE.lock b/segmentation/segmentation/.cache/huggingface/download/LICENSE.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/LICENSE.metadata b/segmentation/segmentation/.cache/huggingface/download/LICENSE.metadata deleted file mode 100644 index b4ffaf14c269f1dc3352d9d1a52ca96192658245..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/LICENSE.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -e5e0c2daded4524693e062d3e4fd016bbfb9a308 -1752243555.817023 diff --git a/segmentation/segmentation/.cache/huggingface/download/README.md.lock b/segmentation/segmentation/.cache/huggingface/download/README.md.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/README.md.metadata b/segmentation/segmentation/.cache/huggingface/download/README.md.metadata deleted file mode 100644 index 9226300ce0091da171a6efe1b86e1123cac052fd..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/README.md.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -5ff4ab94f198cf2882995ff4c1cb29e7db61e5c9 -1752243555.994926 diff --git a/segmentation/segmentation/.cache/huggingface/download/config.yaml.lock b/segmentation/segmentation/.cache/huggingface/download/config.yaml.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/config.yaml.metadata b/segmentation/segmentation/.cache/huggingface/download/config.yaml.metadata deleted file mode 100644 index 136a39dfe88ca175c14d9813cfa46869c1f91534..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/config.yaml.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -3ad7b756881851b5ed1e58231cd3fac2b35bbfd8 -1752243555.996554 diff --git a/segmentation/segmentation/.cache/huggingface/download/example.png.lock b/segmentation/segmentation/.cache/huggingface/download/example.png.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/example.png.metadata b/segmentation/segmentation/.cache/huggingface/download/example.png.metadata deleted file mode 100644 index 3b78b02cb5654e36f07865220cf7a0a167ba64d4..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/example.png.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -be04026868f792563eccd82bd22c91719c828e71 -1752243555.977073 diff --git a/segmentation/segmentation/.cache/huggingface/download/pytorch_model.bin.lock b/segmentation/segmentation/.cache/huggingface/download/pytorch_model.bin.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/pytorch_model.bin.metadata b/segmentation/segmentation/.cache/huggingface/download/pytorch_model.bin.metadata deleted file mode 100644 index d5584f3599ac77337b6f8748802c771a1aa095f0..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/pytorch_model.bin.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -0b5b3216d60a2d32fc086b47ea8c67589aaeb26b7e07fcbe620d6d0b83e209ea -1752243556.188376 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/development.txt.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/development.txt.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/development.txt.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/development.txt.metadata deleted file mode 100644 index 9cb20971b15e3e66d9836c858013fb806350fdf9..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/development.txt.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -daaf8d50c108a254f1b4fc60ab1216977d9da274 -1752243555.946626 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/train.txt.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/train.txt.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/train.txt.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/train.txt.metadata deleted file mode 100644 index d823cf9ef5e076c50b29b9316826f68821843b41..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/dihard3_custom_split/train.txt.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -5949c6a6131ff986ef8f8e0cf70c6c94e4d696be -1752243555.810355 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.development.rttm.metadata deleted file mode 100644 index 4c201388b3c35a52573ef361a731da45945c35bf..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -3b135d9ac6e91a0891d4ed3e1339147fe7b7d1fa -1752243555.999325 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.test.rttm.metadata deleted file mode 100644 index 3b22688dfd2f5d095d0f7a4e5d086932e703b13c..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/AMI.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -da691e28da586cd4ff3be9ba22d0985ffc8548b9 -1752243555.9985962 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.development.rttm.metadata deleted file mode 100644 index e5363dcf28e14b860607127d3a556a7508331977..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -2ecae3d98781c8f2a864fda0e375318c82c39263 -1752243556.199577 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.test.rttm.metadata deleted file mode 100644 index 8fdb72901fdede26c0fc1ab9f35888c7b866c670..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/DIHARD.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -7c683bc25af6ba594726c9365f2714e12ae60571 -1752243556.2304158 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm.metadata deleted file mode 100644 index c78e36ce14b608faee705efae45fa470027e604e..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -664a9483eb437fe36c76425ee44fc6c3023d8a1c -1752243556.184423 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm.metadata deleted file mode 100644 index 57f90febc712142d8e18def643f7b448ed82b6f4..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/osd/VoxConverse.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -bd363c590918e6a58b7183cd3a967e79a79ea39e -1752243556.254667 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.development.rttm.metadata deleted file mode 100644 index d917bad1882e07ede3c912f945a946fa88faa47d..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -e319d7fd601256a2930e520e044a99441b44a258 -1752243556.1886358 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.test.rttm.metadata deleted file mode 100644 index aa66f89844609db9af3e6a1510069c212d101b7f..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/AMI.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -d2df5c0d2e1d33d12ce9fc967d69942ac6c6c84a -1752243556.209785 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm.metadata deleted file mode 100644 index 532478bdc078eaad354192126dbddf27918c1cc8..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -35eec434301f0ee5ca45933cc74968d564442aa2 -1752243556.3079848 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm.metadata deleted file mode 100644 index 409565a8982004195683bc0aec968b8b985937ae..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/DIHARD.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -1156bf05d52fdfa964728819660835743f98478d -1752243556.425746 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm.metadata deleted file mode 100644 index 89ca035a2590297ebe78f8ce9700d5f17119df71..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -bab1b8b55082cd53bf6b019da815c6270cd814be -1752243556.328424 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.development.rttm.metadata deleted file mode 100644 index d0cc5dd7077f7fbf37cb227b25f1de81f2b921d0..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -7270b9f875f54e5b8ca5320afb88a48ade892d46 -1752243556.367011 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.test.rttm.metadata deleted file mode 100644 index 853d24638d55ca4f7b808a6de92bb8d87ec08bd7..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/AMI.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -dd0043dfded1c7978f2263b272e7a72abf5158c3 -1752243556.341415 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.development.rttm.metadata deleted file mode 100644 index 960245d172892c52fcc2be88db466d51dc177a0a..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -104d38c3aaf3cd3a81909c1624ec353c24fc7999 -1752243556.422061 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.test.rttm.metadata deleted file mode 100644 index 5c8ef1f34f0f847c5edf6d6c01b1f0bb9c630816..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/DIHARD.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -93d37d0c598cd1bd999ad5d7c4902a1d7fa01d99 -1752243556.454438 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm.metadata deleted file mode 100644 index d3b986048c22f1163cab61245d3f1a10787ffc45..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.development.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -afa6d3027e9900ed23cad368c8730dd9a9b82b9f -1752243556.386183 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm.metadata deleted file mode 100644 index a33a90a64787864d85afc4c33380528f7d294d53..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vad/VoxConverse.test.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -589af7d31726e64d0583ef2560f9f6e363295174 -1752243556.472178 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/AMI.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/AMI.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/AMI.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/AMI.rttm.metadata deleted file mode 100644 index 17e168b53c94d9d7c9ad98a16288ce772f488bef..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/AMI.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -529c6e4bc6c4147f05a95057970196f6e7777838 -1752243556.495061 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/DIHARD.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/DIHARD.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/DIHARD.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/DIHARD.rttm.metadata deleted file mode 100644 index 7ac902a205f6ad2930ee7a0efa2168c3c4319370..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/DIHARD.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -f1453997160e1b59e5af7288fc2329bb10441079 -1752243556.746528 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/VoxConverse.rttm.lock b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/VoxConverse.rttm.lock deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/VoxConverse.rttm.metadata b/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/VoxConverse.rttm.metadata deleted file mode 100644 index 427a3c6d0904e70e4af336af93d8264011ae78a5..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.cache/huggingface/download/reproducible_research/expected_outputs/vbx/VoxConverse.rttm.metadata +++ /dev/null @@ -1,3 +0,0 @@ -7cc8981665aa696b79c11c0339e73602af4c7350 -2eca9112190100122c168cd71f0cd0db06706e16 -1752243556.5911732 diff --git a/segmentation/segmentation/.gitattributes b/segmentation/segmentation/.gitattributes deleted file mode 100644 index 12a25d60d1e877a9273c14f7336a9812664a06ab..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/.gitattributes +++ /dev/null @@ -1,17 +0,0 @@ -*.bin.* filter=lfs diff=lfs merge=lfs -text -*.lfs.* filter=lfs diff=lfs merge=lfs -text -*.bin filter=lfs diff=lfs merge=lfs -text -*.h5 filter=lfs diff=lfs merge=lfs -text -*.tflite filter=lfs diff=lfs merge=lfs -text -*.tar.gz filter=lfs diff=lfs merge=lfs -text -*.ot filter=lfs diff=lfs merge=lfs -text -*.onnx filter=lfs diff=lfs merge=lfs -text -*.arrow filter=lfs diff=lfs merge=lfs -text -*.ftz filter=lfs diff=lfs merge=lfs -text -*.joblib filter=lfs diff=lfs merge=lfs -text -*.model filter=lfs diff=lfs merge=lfs -text -*.msgpack filter=lfs diff=lfs merge=lfs -text -*.pb filter=lfs diff=lfs merge=lfs -text -*.pt filter=lfs diff=lfs merge=lfs -text -*.pth filter=lfs diff=lfs merge=lfs -text -*.pdf filter=lfs diff=lfs merge=lfs -text diff --git a/segmentation/segmentation/README.md b/segmentation/segmentation/README.md deleted file mode 100644 index 5ff4ab94f198cf2882995ff4c1cb29e7db61e5c9..0000000000000000000000000000000000000000 --- a/segmentation/segmentation/README.md +++ /dev/null @@ -1,144 +0,0 @@ ---- -tags: -- pyannote -- pyannote-audio -- pyannote-audio-model -- audio -- voice -- speech -- speaker -- speaker-segmentation -- voice-activity-detection -- overlapped-speech-detection -- resegmentation -license: mit -inference: false -extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening." -extra_gated_fields: - Company/university: text - Website: text - I plan to use this model for (task, type of audio data, etc): text ---- - -Using this open-source model in production? -Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options. - - -# 🎹 Speaker segmentation - -[Paper](http://arxiv.org/abs/2104.04045) | [Demo](https://huggingface.co/spaces/pyannote/pretrained-pipelines) | [Blog post](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all) - -![Example](example.png) - -## Usage - -Relies on pyannote.audio 2.1.1: see [installation instructions](https://github.com/pyannote/pyannote-audio). - -```python -# 1. visit hf.co/pyannote/segmentation and accept user conditions -# 2. visit hf.co/settings/tokens to create an access token -# 3. instantiate pretrained model -from pyannote.audio import Model -model = Model.from_pretrained("pyannote/segmentation", - use_auth_token="ACCESS_TOKEN_GOES_HERE") -``` - -### Voice activity detection - -```python -from pyannote.audio.pipelines import VoiceActivityDetection -pipeline = VoiceActivityDetection(segmentation=model) -HYPER_PARAMETERS = { - # onset/offset activation thresholds - "onset": 0.5, "offset": 0.5, - # remove speech regions shorter than that many seconds. - "min_duration_on": 0.0, - # fill non-speech regions shorter than that many seconds. - "min_duration_off": 0.0 -} -pipeline.instantiate(HYPER_PARAMETERS) -vad = pipeline("audio.wav") -# `vad` is a pyannote.core.Annotation instance containing speech regions -``` - -### Overlapped speech detection - -```python -from pyannote.audio.pipelines import OverlappedSpeechDetection -pipeline = OverlappedSpeechDetection(segmentation=model) -pipeline.instantiate(HYPER_PARAMETERS) -osd = pipeline("audio.wav") -# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions -``` - -### Resegmentation - -```python -from pyannote.audio.pipelines import Resegmentation -pipeline = Resegmentation(segmentation=model, - diarization="baseline") -pipeline.instantiate(HYPER_PARAMETERS) -resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline}) -# where `baseline` should be provided as a pyannote.core.Annotation instance -``` - -### Raw scores - -```python -from pyannote.audio import Inference -inference = Inference(model) -segmentation = inference("audio.wav") -# `segmentation` is a pyannote.core.SlidingWindowFeature -# instance containing raw segmentation scores like the -# one pictured above (output) -``` - - -## Citation - -```bibtex -@inproceedings{Bredin2021, - Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}}, - Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine}, - Booktitle = {Proc. Interspeech 2021}, - Address = {Brno, Czech Republic}, - Month = {August}, - Year = {2021}, -``` - -```bibtex -@inproceedings{Bredin2020, - Title = {{pyannote.audio: neural building blocks for speaker diarization}}, - Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe}, - Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing}, - Address = {Barcelona, Spain}, - Month = {May}, - Year = {2020}, -} -``` - -## Reproducible research - -In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation -"](https://arxiv.org/abs/2104.04045), use `pyannote/segmentation@Interspeech2021` with the following hyper-parameters: - -| Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off` | -| ------------------------ | ------- | -------- | ----------------- | ------------------ | -| AMI Mix-Headset | 0.684 | 0.577 | 0.181 | 0.037 | -| DIHARD3 | 0.767 | 0.377 | 0.136 | 0.067 | -| VoxConverse | 0.767 | 0.713 | 0.182 | 0.501 | - -| Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off` | -| --------------------------- | ------- | -------- | ----------------- | ------------------ | -| AMI Mix-Headset | 0.448 | 0.362 | 0.116 | 0.187 | -| DIHARD3 | 0.430 | 0.320 | 0.091 | 0.144 | -| VoxConverse | 0.587 | 0.426 | 0.337 | 0.112 | - -| Resegmentation of VBx | `onset` | `offset` | `min_duration_on` | `min_duration_off` | -| --------------------- | ------- | -------- | ----------------- | ------------------ | -| AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705 | -| DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182 | -| VoxConverse | 0.537 | 0.724 | 0.410 | 0.563 | - -Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories. -