5roop
/

Wav2Vec2BertProsodicUnitsFrameClassifier

audio-frame-classification

Model card Files Files and versions Community

5roop commited on Jan 8

Commit

d87b2eb

·

verified ·

1 Parent(s): 8428e77

Update README.md

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -17,8 +17,12 @@ This model predicts prosodic units on speech.
 For each 20ms frame the model predicts 1 or 0, indicating whether there is a prosodic unit in
 this frame or not.
@@ -31,7 +35,7 @@ this frame or not.
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** Peter Rupnik, Nikola Ljubešić
 - **Funded by:** MEZZANINE project
 - **Model type:** Wav2Vec2Bert for Audio Frame Classification
 - **Language(s) (NLP):** Trained and tested on Slovenian, ATM unclear if usable cross-lingually
@@ -259,7 +263,6 @@ final_intervals = merge_events(ds["prosodic_units"], ds["chunk_centroid_s"])
 print(final_intervals)
 # Outputs: [[3.14, 4.96], [5.6, 8.4], [8.62, 9.32], [10.12, 10.7], [11.72, 13.1],....
 ```
-## Bias, Risks, and Limitations
 ## Training Details

 For each 20ms frame the model predicts 1 or 0, indicating whether there is a prosodic unit in
 this frame or not.
+This frame-level output can be grouped into events with the frames_to_intervals function provided in the
+code snippets below.
+It is known that the model is unreliable if the audio starts or ends within a prosodic unit. This can be somewhat
+circumvented by 1) using the largest possible chunks that will fit your machine and 2) use overlapping chunks
+and combining results smartly.
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** Peter Rupnik, Nikola Ljubešić, Darinka Verdonik, Simona Majheničy
 - **Funded by:** MEZZANINE project
 - **Model type:** Wav2Vec2Bert for Audio Frame Classification
 - **Language(s) (NLP):** Trained and tested on Slovenian, ATM unclear if usable cross-lingually
 print(final_intervals)
 # Outputs: [[3.14, 4.96], [5.6, 8.4], [8.62, 9.32], [10.12, 10.7], [11.72, 13.1],....
 ```
 ## Training Details