Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,8 @@ This collaboration is part of the activities of the Center for Advanced Interdis
|
|
18 |
|
19 |
## Data used for training
|
20 |
|
|
|
|
|
21 |
In training of the model, we used the following data sources:
|
22 |
1. Digital Archive for Ethnological and Anthropological Resources (DAEAR) at the Institutе of Ethnology and Anthropology, PMF, UKIM.
|
23 |
2. Audio version of the international journal "EthnoAnthropoZoom" at the Institutе of Ethnology and Anthropology, PMF, UKIM.
|
@@ -26,8 +28,23 @@ In training of the model, we used the following data sources:
|
|
26 |
5. Macedonian version of the Mozilla Common Voice (version 18).
|
27 |
|
28 |
|
|
|
|
|
|
|
|
|
29 |
## Usage
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
```python
|
32 |
from speechbrain.inference.interfaces import foreign_class
|
33 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
@@ -35,4 +52,13 @@ asr_classifier = foreign_class(source="Macedonian-ASR/whisper-large-v3-macedonia
|
|
35 |
asr_classifier = asr_classifier.to(device)
|
36 |
predictions = asr_classifier.classify_file("audio_file.wav", device)
|
37 |
print(predictions)
|
38 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
## Data used for training
|
20 |
|
21 |
+
The model is trained on around 60 hours of Macedonian speech.
|
22 |
+
|
23 |
In training of the model, we used the following data sources:
|
24 |
1. Digital Archive for Ethnological and Anthropological Resources (DAEAR) at the Institutе of Ethnology and Anthropology, PMF, UKIM.
|
25 |
2. Audio version of the international journal "EthnoAnthropoZoom" at the Institutе of Ethnology and Anthropology, PMF, UKIM.
|
|
|
28 |
5. Macedonian version of the Mozilla Common Voice (version 18).
|
29 |
|
30 |
|
31 |
+
## Model description
|
32 |
+
This model is a fine-tuned version of the large Whisper-v3 model. During fine-tuning, the encoder was kept frozen and only the decoder was optimized.
|
33 |
+
|
34 |
+
|
35 |
## Usage
|
36 |
|
37 |
+
The model is developed using the [SpeechBrain](https://speechbrain.github.io) toolkit. To use it, you need to install SpeechBrain with:
|
38 |
+
```
|
39 |
+
pip install speechbrain
|
40 |
+
```
|
41 |
+
SpeechBrain relies on the Transformers library, therefore you need install it:
|
42 |
+
```
|
43 |
+
pip install transformers
|
44 |
+
```
|
45 |
+
|
46 |
+
An external `py_module_file=custom_interface.py` is used as an external Predictor class into this HF repos. We use the `foreign_class` function from `speechbrain.pretrained.interfaces` that allows you to load your custom model.
|
47 |
+
|
48 |
```python
|
49 |
from speechbrain.inference.interfaces import foreign_class
|
50 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
|
|
52 |
asr_classifier = asr_classifier.to(device)
|
53 |
predictions = asr_classifier.classify_file("audio_file.wav", device)
|
54 |
print(predictions)
|
55 |
+
```
|
56 |
+
|
57 |
+
## Training
|
58 |
+
|
59 |
+
To fine-tune this model, you need to run:
|
60 |
+
```
|
61 |
+
python train.py hyperparams.yaml
|
62 |
+
```
|
63 |
+
|
64 |
+
```train.py``` file contains the functions necessary for training the model and ```hyperparams.yaml``` contains the hyperparameters. For more details about training the model, refer to the [SpeechBrain](https://speechbrain.github.io) documentation.
|