Update README.md
Browse files
README.md
CHANGED
|
@@ -17,8 +17,12 @@ This model predicts prosodic units on speech.
|
|
| 17 |
For each 20ms frame the model predicts 1 or 0, indicating whether there is a prosodic unit in
|
| 18 |
this frame or not.
|
| 19 |
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
|
|
@@ -31,7 +35,7 @@ this frame or not.
|
|
| 31 |
|
| 32 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
| 33 |
|
| 34 |
-
- **Developed by:** Peter Rupnik, Nikola
|
| 35 |
- **Funded by:** MEZZANINE project
|
| 36 |
- **Model type:** Wav2Vec2Bert for Audio Frame Classification
|
| 37 |
- **Language(s) (NLP):** Trained and tested on Slovenian, ATM unclear if usable cross-lingually
|
|
@@ -259,7 +263,6 @@ final_intervals = merge_events(ds["prosodic_units"], ds["chunk_centroid_s"])
|
|
| 259 |
print(final_intervals)
|
| 260 |
# Outputs: [[3.14, 4.96], [5.6, 8.4], [8.62, 9.32], [10.12, 10.7], [11.72, 13.1],....
|
| 261 |
```
|
| 262 |
-
## Bias, Risks, and Limitations
|
| 263 |
|
| 264 |
## Training Details
|
| 265 |
|
|
|
|
| 17 |
For each 20ms frame the model predicts 1 or 0, indicating whether there is a prosodic unit in
|
| 18 |
this frame or not.
|
| 19 |
|
| 20 |
+
This frame-level output can be grouped into events with the frames_to_intervals function provided in the
|
| 21 |
+
code snippets below.
|
| 22 |
|
| 23 |
+
It is known that the model is unreliable if the audio starts or ends within a prosodic unit. This can be somewhat
|
| 24 |
+
circumvented by 1) using the largest possible chunks that will fit your machine and 2) use overlapping chunks
|
| 25 |
+
and combining results smartly.
|
| 26 |
|
| 27 |
|
| 28 |
|
|
|
|
| 35 |
|
| 36 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
| 37 |
|
| 38 |
+
- **Developed by:** Peter Rupnik, Nikola Ljubešić, Darinka Verdonik, Simona Majheničy
|
| 39 |
- **Funded by:** MEZZANINE project
|
| 40 |
- **Model type:** Wav2Vec2Bert for Audio Frame Classification
|
| 41 |
- **Language(s) (NLP):** Trained and tested on Slovenian, ATM unclear if usable cross-lingually
|
|
|
|
| 263 |
print(final_intervals)
|
| 264 |
# Outputs: [[3.14, 4.96], [5.6, 8.4], [8.62, 9.32], [10.12, 10.7], [11.72, 13.1],....
|
| 265 |
```
|
|
|
|
| 266 |
|
| 267 |
## Training Details
|
| 268 |
|