update model card README.md
Browse files
README.md
CHANGED
@@ -1,62 +1,51 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
datasets:
|
4 |
-
-
|
5 |
-
|
6 |
-
-
|
7 |
-
|
8 |
-
metrics:
|
9 |
-
- wer
|
10 |
-
pipeline_tag: automatic-speech-recognition
|
11 |
---
|
12 |
|
13 |
-
|
|
|
14 |
|
15 |
-
|
16 |
-
The Swahili ASR is an end-to-end automatic speech recognition system that was finetuned on the Common Voice Corpus 11.0 Swahili dataset. This repository provides the necessary tools to perform ASR using this model, allowing for high-quality speech-to-text conversions in Swahili.
|
17 |
|
|
|
18 |
|
19 |
-
|
20 |
-
|-------------------|--------------------|--------------|-------------------------|-----------------------|-------|
|
21 |
-
| 0.345414400100708 | 0.2602372795622284 | 578.4006 | 17.701 | 2.213 | 4.17 |
|
22 |
|
23 |
-
|
24 |
-
This model is intended for any application requiring Swahili speech-to-text conversion, including but not limited to transcription services, voice assistants, and accessibility technology. It can be particularly beneficial in any context where demographic metadata (age, sex, accent) is significant, as these features have been taken into account during training.
|
25 |
|
26 |
-
##
|
27 |
-
The model was trained on the Common Voice Corpus 11.0 Swahili dataset, which consists of unique MP3 files and corresponding text files, totaling 16,413 validated hours. Additionally, much of the dataset includes valuable demographic metadata, such as age, sex, and accent, contributing to a more accurate and contextually-aware ASR model.
|
28 |
|
29 |
-
|
30 |
|
31 |
-
## Training
|
32 |
|
33 |
-
|
34 |
-
The ASR system has two interconnected stages: the Tokenizer (unigram) and the Acoustic model (wav2vec2.0 + CTC).
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
|
40 |
-
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
-
###
|
44 |
-
The training was performed using the following compute infrastructure:
|
45 |
-
|
46 |
-
| [Compute](https://instances.vantage.sh/aws/ec2/g5.8xlarge#Compute) | Value |
|
47 |
-
| ------------------------------------------------------------------------------------------ | ------------- |
|
48 |
-
| vCPUs | 32 |
|
49 |
-
| Memory (GiB) | 128.0 |
|
50 |
-
| Memory per vCPU (GiB) | 4.0 |
|
51 |
-
| Physical Processor | AMD EPYC 7R32 |
|
52 |
-
| Clock Speed (GHz) | 2.8 |
|
53 |
-
| CPU Architecture | x86_64 |
|
54 |
-
| GPU | 1 |
|
55 |
-
| GPU Architecture | nvidia a10g |
|
56 |
-
| Video Memory (GiB) | 24 |
|
57 |
-
| GPU Compute Capability [(?)](https://handbook.vantage.sh/aws/reference/aws-gpu-instances/) | 7.5 |
|
58 |
-
| FPGA | 0 |
|
59 |
-
|
60 |
-
## About THiNK
|
61 |
-
THiNK is a technology initiative driven by a community of innovators and businesses. It brings together a collaborative platform that provides services to assist businesses in all sectors, particularly in their digital transformation journey.
|
62 |
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
tags:
|
3 |
+
- generated_from_trainer
|
4 |
datasets:
|
5 |
+
- common_voice_11_0
|
6 |
+
model-index:
|
7 |
+
- name: wav2vec2-large-xls-r-300m-sw
|
8 |
+
results: []
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
+
should probably proofread and complete it, then remove this comment. -->
|
13 |
|
14 |
+
# wav2vec2-large-xls-r-300m-sw
|
|
|
15 |
|
16 |
+
This model was trained from scratch on the common_voice_11_0 dataset.
|
17 |
|
18 |
+
## Model description
|
|
|
|
|
19 |
|
20 |
+
More information needed
|
|
|
21 |
|
22 |
+
## Intended uses & limitations
|
|
|
23 |
|
24 |
+
More information needed
|
25 |
|
26 |
+
## Training and evaluation data
|
27 |
|
28 |
+
More information needed
|
|
|
29 |
|
30 |
+
## Training procedure
|
31 |
|
32 |
+
### Training hyperparameters
|
33 |
|
34 |
+
The following hyperparameters were used during training:
|
35 |
+
- learning_rate: 0.0003
|
36 |
+
- train_batch_size: 16
|
37 |
+
- eval_batch_size: 8
|
38 |
+
- seed: 42
|
39 |
+
- gradient_accumulation_steps: 2
|
40 |
+
- total_train_batch_size: 32
|
41 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
42 |
+
- lr_scheduler_type: linear
|
43 |
+
- lr_scheduler_warmup_steps: 500
|
44 |
+
- num_epochs: 30
|
45 |
|
46 |
+
### Framework versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
+
- Transformers 4.31.0
|
49 |
+
- Pytorch 2.0.1
|
50 |
+
- Datasets 2.13.1
|
51 |
+
- Tokenizers 0.13.3
|