File size: 2,537 Bytes

6b06f03
d5b661b
6b06f03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c0ff3e
 
6b06f03
 
 
 
 
 
 
 
3c0ff3e
 
 
 
b8d9bbd
 
 
 
 
 
 
 
6b06f03
 
3c0ff3e
 
 
 
 
 
 
 
 
 
 
2fe9d2c
 
 
c2ba5af
2fe9d2c
 
 
3c0ff3e

---
license: apache-2.0
language:
- mk
base_model:
- openai/whisper-large-v3
---

# Fine-tuned whisper-large-v3 model for speech recognition in Macedonian

Authors:
1. Dejan Porjazovski
2. Ilina Jakimovska
3. Ordan Chukaliev
4. Nikola Stikov

This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.

## Data used for training

The model is trained on around 60 hours of Macedonian speech.

In training of the model, we used the following data sources:
1. Digital Archive for Ethnological and Anthropological Resources (DAEAR) at the Institutе of Ethnology and Anthropology, PMF, UKIM.
2. Audio version of the international journal "EthnoAnthropoZoom" at the Institutе of Ethnology and Anthropology, PMF, UKIM.
3. The podcast "Обични луѓе" by Ilina Jakimovska.
4. The scientific videos from the series "Наука за деца", foundation KANTAROT.
5. Macedonian version of the Mozilla Common Voice (version 18).


## Model description
This model is a fine-tuned version of the large Whisper-v3 model. During fine-tuning, the encoder was kept frozen and only the decoder was optimized.


## Results

The results are reported on all the test sets combined.

WER: 10.51 \
CER: 4.43


## Usage

The model is developed using the [SpeechBrain](https://speechbrain.github.io) toolkit. To use it, you need to install SpeechBrain with:
```
pip install speechbrain
```
SpeechBrain relies on the Transformers library, therefore you need install it:
```
pip install transformers
```

An external `py_module_file=custom_interface.py` is used as an external Predictor class into this HF repos. We use the `foreign_class` function from `speechbrain.pretrained.interfaces` that allows you to load your custom model. 

```python
from speechbrain.inference.interfaces import foreign_class
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
asr_classifier = foreign_class(source="Macedonian-ASR/whisper-large-v3-macedonian-asr", pymodule_file="custom_interface.py", classname="ASR")
asr_classifier = asr_classifier.to(device)
predictions = asr_classifier.classify_file("audio_file.wav", device)
print(predictions)
```

## Training

To fine-tune this model, you need to run:
```
python train.py hyperparams.yaml
```

```train.py``` file contains the functions necessary for training the model and ```hyperparams.yaml``` contains the hyperparameters. For more details about training the model, refer to the [SpeechBrain](https://speechbrain.github.io) documentation.