Hervé Bredin
commited on
Commit
·
2dbbe55
1
Parent(s):
52200fc
doc: update README
Browse files
README.md
CHANGED
@@ -31,44 +31,24 @@ Relies on pyannote.audio 2.0 currently in development: see [installation instruc
|
|
31 |
For commercial enquiries and scientific consulting, please contact [me](mailto:[email protected]).
|
32 |
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
33 |
|
34 |
-
##
|
35 |
|
36 |
-
|
37 |
-
from pyannote.audio import Inference
|
38 |
-
inference = Inference("pyannote/segmentation")
|
39 |
-
segmentation = inference("audio.wav")
|
40 |
-
# `segmentation` is a pyannote.core.SlidingWindowFeature
|
41 |
-
# instance containing raw segmentation scores like the
|
42 |
-
# one pictured above (output)
|
43 |
|
44 |
-
|
45 |
-
|
|
|
46 |
HYPER_PARAMETERS = {
|
47 |
# onset/offset activation thresholds
|
48 |
"onset": 0.5, "offset": 0.5,
|
49 |
-
# remove
|
50 |
"min_duration_on": 0.0,
|
51 |
-
# fill
|
52 |
"min_duration_off": 0.0
|
53 |
}
|
54 |
-
|
55 |
-
pipeline.instantiate(HYPER_PARAMETERS)
|
56 |
-
segmentation = pipeline("audio.wav")
|
57 |
-
# `segmentation` now is a pyannote.core.Annotation
|
58 |
-
# instance containing a hard binary segmentation
|
59 |
-
# like the one picutred above (reference)
|
60 |
-
```
|
61 |
-
|
62 |
-
|
63 |
-
## Advanced usage
|
64 |
-
|
65 |
-
### Voice activity detection
|
66 |
-
|
67 |
-
```python
|
68 |
-
from pyannote.audio.pipelines import VoiceActivityDetection
|
69 |
-
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
|
70 |
pipeline.instantiate(HYPER_PARAMETERS)
|
71 |
vad = pipeline("audio.wav")
|
|
|
72 |
```
|
73 |
|
74 |
### Overlapped speech detection
|
@@ -78,6 +58,7 @@ from pyannote.audio.pipelines import OverlappedSpeechDetection
|
|
78 |
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
|
79 |
pipeline.instantiate(HYPER_PARAMETERS)
|
80 |
osd = pipeline("audio.wav")
|
|
|
81 |
```
|
82 |
|
83 |
### Resegmentation
|
@@ -91,6 +72,17 @@ resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
|
|
91 |
# where `baseline` should be provided as a pyannote.core.Annotation instance
|
92 |
```
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
## Reproducible research
|
95 |
|
96 |
In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
|
@@ -118,6 +110,16 @@ Expected outputs (and VBx baseline) are also provided in the `/reproducible_rese
|
|
118 |
|
119 |
## Citation
|
120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
121 |
```bibtex
|
122 |
@inproceedings{Bredin2020,
|
123 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|
|
|
31 |
For commercial enquiries and scientific consulting, please contact [me](mailto:[email protected]).
|
32 |
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
33 |
|
34 |
+
## Usage
|
35 |
|
36 |
+
### Voice activity detection
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
+
```python
|
39 |
+
from pyannote.audio.pipelines import VoiceActivityDetection
|
40 |
+
pipeline = VoiceActivityDetection(segmentation="pyannote/segmentation")
|
41 |
HYPER_PARAMETERS = {
|
42 |
# onset/offset activation thresholds
|
43 |
"onset": 0.5, "offset": 0.5,
|
44 |
+
# remove speech regions shorter than that many seconds.
|
45 |
"min_duration_on": 0.0,
|
46 |
+
# fill non-speech regions shorter than that many seconds.
|
47 |
"min_duration_off": 0.0
|
48 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
pipeline.instantiate(HYPER_PARAMETERS)
|
50 |
vad = pipeline("audio.wav")
|
51 |
+
# `vad` is a pyannote.core.Annotation instance containing speech regions
|
52 |
```
|
53 |
|
54 |
### Overlapped speech detection
|
|
|
58 |
pipeline = OverlappedSpeechDetection(segmentation="pyannote/segmentation")
|
59 |
pipeline.instantiate(HYPER_PARAMETERS)
|
60 |
osd = pipeline("audio.wav")
|
61 |
+
# `osd` is a pyannote.core.Annotation instance containing overlapped speech regions
|
62 |
```
|
63 |
|
64 |
### Resegmentation
|
|
|
72 |
# where `baseline` should be provided as a pyannote.core.Annotation instance
|
73 |
```
|
74 |
|
75 |
+
### Raw scores
|
76 |
+
|
77 |
+
```python
|
78 |
+
from pyannote.audio import Inference
|
79 |
+
inference = Inference("pyannote/segmentation")
|
80 |
+
segmentation = inference("audio.wav")
|
81 |
+
# `segmentation` is a pyannote.core.SlidingWindowFeature
|
82 |
+
# instance containing raw segmentation scores like the
|
83 |
+
# one pictured above (output)
|
84 |
+
```
|
85 |
+
|
86 |
## Reproducible research
|
87 |
|
88 |
In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
|
|
|
110 |
|
111 |
## Citation
|
112 |
|
113 |
+
```bibtex
|
114 |
+
@inproceedings{Bredin2021,
|
115 |
+
Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
|
116 |
+
Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
|
117 |
+
Booktitle = {Proc. Interspeech 2021},
|
118 |
+
Address = {Brno, Czech Republic},
|
119 |
+
Month = {August},
|
120 |
+
Year = {2021},
|
121 |
+
```
|
122 |
+
|
123 |
```bibtex
|
124 |
@inproceedings{Bredin2020,
|
125 |
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
|