File size: 3,626 Bytes

159c3a9
 
66bbaf2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159c3a9
 
66bbaf2
 
159c3a9
66bbaf2
159c3a9
66bbaf2
 
 
 
 
 
 
 
159c3a9
66bbaf2
159c3a9
66bbaf2
159c3a9
66bbaf2
159c3a9
66bbaf2
159c3a9
66bbaf2
159c3a9
66bbaf2
159c3a9
66bbaf2
159c3a9
66bbaf2
159c3a9
66bbaf2
 
 
 
 
 
 
 
 
 
 
159c3a9
66bbaf2
159c3a9
66bbaf2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159c3a9
 
66bbaf2
159c3a9
66bbaf2

---
library_name: transformers
language:
- en
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- wft
- whisper
- automatic-speech-recognition
- audio
- speech
- generated_from_trainer
datasets:
- ntnu-smil/ami-1s-ft
metrics:
- wer
model-index:
- name: whisper-large-v3-ami-1
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: ntnu-smil/ami-1s-ft
      type: ntnu-smil/ami-1s-ft
    metrics:
    - type: wer
      value: 73.28296703296702
      name: Wer
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-large-v3-ami-1

This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the ntnu-smil/ami-1s-ft dataset.
It achieves the following results on the evaluation set:
- Loss: 3.6457
- Wer: 73.2830
- Cer: 65.1890
- Decode Runtime: 3.7197
- Wer Runtime: 0.0090
- Cer Runtime: 0.0152

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 7e-05
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 1024
- optimizer: Use adamw_torch with betas=(0.9,0.98) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- training_steps: 130

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Wer      | Cer      | Decode Runtime | Wer Runtime | Cer Runtime |
|:-------------:|:------:|:----:|:---------------:|:--------:|:--------:|:--------------:|:-----------:|:-----------:|
| 2.2365        | 0.0769 | 10   | 3.2101          | 71.2225  | 305.1720 | 5.7416         | 0.0099      | 0.0322      |
| 1.9464        | 0.1538 | 20   | 3.1678          | 81.2843  | 319.6875 | 5.8313         | 0.0098      | 0.0337      |
| 1.5994        | 0.2308 | 30   | 3.0765          | 106.4904 | 341.3692 | 5.8220         | 0.0105      | 0.0351      |
| 1.1357        | 0.3077 | 40   | 3.2982          | 129.5330 | 214.6070 | 5.6144         | 0.0102      | 0.0259      |
| 0.4404        | 0.3846 | 50   | 3.4638          | 72.2871  | 98.6465  | 3.8830         | 0.0093      | 0.0179      |
| 0.3252        | 0.4615 | 60   | 3.3927          | 65.1099  | 80.9729  | 3.7645         | 0.0091      | 0.0167      |
| 0.3713        | 1.0231 | 70   | 3.4800          | 58.9629  | 49.3854  | 3.4950         | 0.0090      | 0.0142      |
| 0.2562        | 1.1    | 80   | 3.5965          | 54.0522  | 31.3522  | 3.3013         | 0.0089      | 0.0130      |
| 0.1821        | 1.1769 | 90   | 3.6241          | 70.4327  | 56.6693  | 3.6241         | 0.0089      | 0.0146      |
| 0.1847        | 1.2538 | 100  | 3.6725          | 66.2775  | 50.4512  | 3.6175         | 0.0090      | 0.2387      |
| 0.2257        | 1.3308 | 110  | 3.6518          | 64.8695  | 50.6408  | 3.5330         | 0.0090      | 0.0141      |
| 0.2672        | 1.4077 | 120  | 3.6463          | 69.7802  | 59.8928  | 3.6917         | 0.0090      | 0.0146      |
| 0.2578        | 1.4846 | 130  | 3.6457          | 73.2830  | 65.1890  | 3.7197         | 0.0090      | 0.0152      |


### Framework versions

- PEFT 0.14.0
- Transformers 4.48.0
- Pytorch 2.5.1
- Datasets 3.2.0
- Tokenizers 0.21.0