---
base_model: facebook/w2v-bert-2.0
library_name: transformers
language: 
  - uk
license: "apache-2.0"
task_categories:
- automatic-speech-recognition
tags:
- audio
datasets:
  - Yehor/openstt-uk
metrics:
  - wer
model-index:
  - name: w2v-bert-uk-v2.1
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_10_0
          type: common_voice_10_0
          config: uk
          split: test
          args: uk
        metrics:
          - name: WER
            type: wer
            value: 17.34
          - name: CER
            type: cer
            value: 3.33
---

# w2v-bert-uk `v2.1`


## Community

- **Discord**: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk

See other Ukrainian models: https://github.com/egorsmkv/speech-recognition-uk

## Overview

This is a next model of https://huggingface.co/Yehor/w2v-bert-uk


## Metrics

- AM (F16):
  - WER: 0.1734 metric, 17.34%
  - CER: 0.0333 metric, 3.33%
  - Accuracy on words: 82.66%
  - Accuracy on chars: 96.67%

## Demo

Use https://huggingface.co/spaces/Yehor/w2v-bert-uk-v2.1-demo space to see how the model works with your audios.

## Usage

```python
# pip install -U torch soundfile transformers

import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor

# Config
model_name = 'Yehor/w2v-bert-uk-v2.1'
device = 'cuda:0' # or cpu
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
sampling_rate = 16_000

# Load the model
asr_model = AutoModelForCTC.from_pretrained(model_name, torch_dtype=torch_dtype).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)

paths = [
  'sample1.wav',
]

# Extract audio
audio_inputs = []
for path in paths:
  audio_input, _ = sf.read(path)
  audio_inputs.append(audio_input)

# Transcribe the audio
inputs = processor(audio_inputs, sampling_rate=sampling_rate).input_features
features = torch.tensor(inputs).to(device)

with torch.inference_mode():
  logits = asr_model(features).logits

predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids)

# Log results
print('Predictions:')
print(predictions)
```

## Cite this work

```
@misc {smoliakov_2025,
	author       = { {Smoliakov} },
	title        = { w2v-bert-uk-v2.1 (Revision 094c59d) },
	year         = 2025,
	url          = { https://huggingface.co/Yehor/w2v-bert-uk-v2.1 },
	doi          = { 10.57967/hf/4554 },
	publisher    = { Hugging Face }
}
```