ifrz
/

wav2vec2-large-xlsr-galician

Automatic Speech Recognition

Model card Files Files and versions Metrics Training metrics Community

wav2vec2-large-xlsr-galician / README.md

ifrz's picture

Update README.md

0e2118b over 2 years ago

|

1.85 kB

	# wav2vec2-large-xlsr-galician
	---
	language: gl
	datasets:
	- OpenSLR 77
	- mozilla-foundation common_voice_8_0
	metrics:
	- wer
	tags:
	- audio
	- automatic-speech-recognition
	- speech
	- xlsr-fine-tuning-week
	license: apache-2.0
	model-index:
	- name: Galician wav2vec2-large-xlsr-galician
	results:
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset_1:
	name: OpenSLR
	type: openslr
	args: gl
	dataset_2:
	name: mozilla-foundation
	type: common voice
	args: gl
	metrics:
	- name: Test WER
	type: wer
	value: 7.12
	---

	# Model

	Fine-tuned model for Galician language

	Based on the [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) self-supervised model
	Fine-tune with audio labelled from [OpenSLR](https://openslr.org/77/) and Mozilla [Common_Voice](https://commonvoice.mozilla.org/gl) (both datasets previously refined)

	Check training metrics to see results

	# Testing

	Make sure that the audio speech input is sampled at 16kHz (mono).


	```python
	from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

	model = Wav2Vec2ForCTC.from_pretrained("ifrz/wav2vec2-large-xlsr-galician")
	processor = Wav2Vec2Processor.from_pretrained("ifrz/wav2vec2-large-xlsr-galician")

	# Reading taken audio clip
	import librosa, torch
	audio, rate = librosa.load("./gl_test_1.wav", sr = 16000)

	# Taking an input value
	input_values = processor(audio, sampling_rate=16_000, return_tensors = "pt", padding="longest").input_values
	# Storing logits (non-normalized prediction values)
	logits = model(input_values).logits
	# Storing predicted ids
	prediction = torch.argmax(logits, dim = -1)

	# Passing the prediction to the tokenzer decode to get the transcription
	transcription = processor.batch_decode(prediction)[0]
	print(transcription)
	```