Hervé Bredin
commited on
Commit
·
df706b9
1
Parent(s):
f47dcce
fix: fix README
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ license: mit
|
|
17 |
inference: false
|
18 |
---
|
19 |
|
20 |
-
#
|
21 |
|
22 |
This model relies on `pyannote.audio` 2.0 (which is still in development):
|
23 |
|
@@ -29,7 +29,7 @@ $ pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
|
|
29 |
|
30 |
```python
|
31 |
>>> from pyannote.audio import Inference
|
32 |
-
>>> inference = Inference("pyannote/
|
33 |
>>> segmentation = inference("audio.wav")
|
34 |
```
|
35 |
|
@@ -40,25 +40,30 @@ $ pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
|
|
40 |
```python
|
41 |
>>> from pyannote.audio.pipelines import VoiceActivityDetection
|
42 |
>>> HYPER_PARAMETERS = {"onset": 0.5, "offset": 0.5, "min_duration_on": 0.0, "min_duration_off": 0.0}
|
43 |
-
>>> pipeline = VoiceActivityDetection(segmentation="pyannote/
|
|
|
44 |
>>> vad = pipeline("audio.wav")
|
45 |
```
|
46 |
|
|
|
|
|
47 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
48 |
----------------|---------|----------|-------------------|-------------------
|
49 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
50 |
DIHARD3 | TODO | TODO | TODO | TODO
|
51 |
VoxConverse | TODO | TODO | TODO | TODO
|
52 |
|
53 |
-
|
54 |
### Overlapped speech detection
|
55 |
|
56 |
```python
|
57 |
>>> from pyannote.audio.pipelines import OverlappedSpeechDetection
|
58 |
-
>>> pipeline = OverlappedSpeechDetection(segmentation="pyannote/
|
|
|
59 |
>>> osd = pipeline("audio.wav")
|
60 |
```
|
61 |
|
|
|
|
|
62 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
63 |
----------------|---------|----------|-------------------|-------------------
|
64 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
@@ -70,9 +75,12 @@ VoxConverse | TODO | TODO | TODO | TODO
|
|
70 |
|
71 |
```python
|
72 |
>>> from pyannote.audio.pipelines import Segmentation
|
73 |
-
>>> pipeline = Segmentation(segmentation="pyannote/
|
|
|
74 |
>>> seg = pipeline("audio.wav")
|
75 |
```
|
|
|
|
|
76 |
|
77 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
78 |
----------------|---------|----------|-------------------|-------------------
|
@@ -84,11 +92,22 @@ VoxConverse | TODO | TODO | TODO | TODO
|
|
84 |
|
85 |
```python
|
86 |
>>> from pyannote.audio.pipelines import Resegmentation
|
87 |
-
>>> pipeline = Resegmentation(segmentation="pyannote/
|
88 |
-
|
89 |
-
>>>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
```
|
91 |
|
|
|
|
|
92 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
93 |
----------------|---------|----------|-------------------|-------------------
|
94 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
@@ -97,7 +116,6 @@ VoxConverse | TODO | TODO | TODO | TODO
|
|
97 |
|
98 |
## Citations
|
99 |
|
100 |
-
|
101 |
```bibtex
|
102 |
@inproceedings{Bredin2020,
|
103 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|
|
|
17 |
inference: false
|
18 |
---
|
19 |
|
20 |
+
# pyannote.audio // speaker segmentation
|
21 |
|
22 |
This model relies on `pyannote.audio` 2.0 (which is still in development):
|
23 |
|
|
|
29 |
|
30 |
```python
|
31 |
>>> from pyannote.audio import Inference
|
32 |
+
>>> inference = Inference("pyannote/segmentation")
|
33 |
>>> segmentation = inference("audio.wav")
|
34 |
```
|
35 |
|
|
|
40 |
```python
|
41 |
>>> from pyannote.audio.pipelines import VoiceActivityDetection
|
42 |
>>> HYPER_PARAMETERS = {"onset": 0.5, "offset": 0.5, "min_duration_on": 0.0, "min_duration_off": 0.0}
|
43 |
+
>>> pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
|
44 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
45 |
>>> vad = pipeline("audio.wav")
|
46 |
```
|
47 |
|
48 |
+
In order to reproduce results of the paper, one should use the following hyper-parameter values:
|
49 |
+
|
50 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
51 |
----------------|---------|----------|-------------------|-------------------
|
52 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
53 |
DIHARD3 | TODO | TODO | TODO | TODO
|
54 |
VoxConverse | TODO | TODO | TODO | TODO
|
55 |
|
|
|
56 |
### Overlapped speech detection
|
57 |
|
58 |
```python
|
59 |
>>> from pyannote.audio.pipelines import OverlappedSpeechDetection
|
60 |
+
>>> pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
|
61 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
62 |
>>> osd = pipeline("audio.wav")
|
63 |
```
|
64 |
|
65 |
+
In order to reproduce results of the paper, one should use the following hyper-parameter values:
|
66 |
+
|
67 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
68 |
----------------|---------|----------|-------------------|-------------------
|
69 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
|
|
75 |
|
76 |
```python
|
77 |
>>> from pyannote.audio.pipelines import Segmentation
|
78 |
+
>>> pipeline = Segmentation(segmentation="pyannote/segmentation")
|
79 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
80 |
>>> seg = pipeline("audio.wav")
|
81 |
```
|
82 |
+
In order to reproduce results of the paper, one should use the following hyper-parameter values:
|
83 |
+
|
84 |
|
85 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
86 |
----------------|---------|----------|-------------------|-------------------
|
|
|
92 |
|
93 |
```python
|
94 |
>>> from pyannote.audio.pipelines import Resegmentation
|
95 |
+
>>> pipeline = Resegmentation(segmentation="pyannote/segmentation",
|
96 |
+
... diarization="baseline")
|
97 |
+
>>> pipeline.instantiate(HYPER_PARAMETERS)
|
98 |
+
```
|
99 |
+
|
100 |
+
VBx RTTM files are also provided in this repository for convenience:
|
101 |
+
|
102 |
+
```python
|
103 |
+
>>> from pyannote.database.utils import load_rttm
|
104 |
+
>>> vbx = load_rttm("/path/to/vbx.rttm")
|
105 |
+
>>> resegmented_vbx = pipeline({"audio": "DH_EVAL_000.wav",
|
106 |
+
... "baseline": vbx["DH_EVAL_000"]})
|
107 |
```
|
108 |
|
109 |
+
In order to reproduce (VBx) results of the paper, one should use the following hyper-parameter values:
|
110 |
+
|
111 |
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
112 |
----------------|---------|----------|-------------------|-------------------
|
113 |
AMI Mix-Headset | TODO | TODO | TODO | TODO
|
|
|
116 |
|
117 |
## Citations
|
118 |
|
|
|
119 |
```bibtex
|
120 |
@inproceedings{Bredin2020,
|
121 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|