mahmoudmamdouh13 commited on
Commit
db2ddb2
·
verified ·
1 Parent(s): 7698bcc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -23
README.md CHANGED
@@ -42,12 +42,7 @@ model-index:
42
  - **Base pre-trained checkpoint:** MIT AST fine-tuned on Google Speech Commands v0.02
43
  - **Fine-tuning dataset:** Custom dataset drawn from MLCommons Multilingual Spoken Words corpus, augmented with `_silence_` and `_unknown_` categories sampled from Google Speech Commands v0.02
44
  - **License:** Apache 2.0
45
- - **Framework:** PyTorch
46
 
47
- ## Use Case
48
- - **Primary use case:** Keyword spotting and spoken-word classification in multilingual voice interfaces
49
- - **Territory:** Real-time small-vocabulary speech recognition for embedded and mobile devices
50
- - **Out of scope:** Large-vocabulary continuous speech recognition, speaker identification, emotion recognition
51
 
52
  ## Model Inputs and Outputs
53
  - **Input:** 16 kHz mono audio, 1-second clips (or padded/truncated to 1 sec), converted to log-mel spectrograms with 128 mel bins and 10 ms hop length
@@ -65,7 +60,7 @@ model-index:
65
 
66
  ## Training Data
67
 
68
- * Total samples: \~XX,XXX utterances
69
  * **Sources:**
70
 
71
  * MLCommons Multilingual Spoken Words corpus (covering 40+ languages)
@@ -74,11 +69,10 @@ model-index:
74
 
75
  * Resampling to 16 kHz
76
  * Fixed-length one-second windows with zero-padding or cropping
77
- * Data augmentation: time shift (±100 ms), additive background noise (SNR 10–20 dB)
78
 
79
  ## Evaluation Results
80
 
81
- * **Test split:** Held-out 20% of the combined dataset (stratified across classes)
82
 
83
  | Metric | Value |
84
  | --------- | ------ |
@@ -99,21 +93,6 @@ model-index:
99
  * `_unknown_` class may not cover all out-of-vocabulary words; false positives possible
100
  * Performance may degrade on dialects or languages underrepresented in training
101
 
102
- ## Recommendations for Use
103
-
104
- * **On-device deployment:** Convert to `safetensors` format to reduce size and improve loading speed
105
- * **Runtime:** \~20M parameters; inference latency \~30 ms on mobile SoC
106
- * **Performance tips:**
107
-
108
- * Fine-tune threshold per class for high-recall vs. high-precision scenarios
109
- * Use simple VAD front-end to suppress silent frames
110
-
111
- ## Ethical Considerations and Bias
112
-
113
- * Data covers several languages but is unbalanced: some languages underrepresented
114
- * Potential for misrecognition in low-resource languages or non-standard accents
115
- * Not intended for security-sensitive applications (e.g., authentication)
116
-
117
  ## Citation
118
 
119
  ```bibtex
 
42
  - **Base pre-trained checkpoint:** MIT AST fine-tuned on Google Speech Commands v0.02
43
  - **Fine-tuning dataset:** Custom dataset drawn from MLCommons Multilingual Spoken Words corpus, augmented with `_silence_` and `_unknown_` categories sampled from Google Speech Commands v0.02
44
  - **License:** Apache 2.0
 
45
 
 
 
 
 
46
 
47
  ## Model Inputs and Outputs
48
  - **Input:** 16 kHz mono audio, 1-second clips (or padded/truncated to 1 sec), converted to log-mel spectrograms with 128 mel bins and 10 ms hop length
 
60
 
61
  ## Training Data
62
 
63
+ * Total samples: \~145,005 utterances
64
  * **Sources:**
65
 
66
  * MLCommons Multilingual Spoken Words corpus (covering 40+ languages)
 
69
 
70
  * Resampling to 16 kHz
71
  * Fixed-length one-second windows with zero-padding or cropping
 
72
 
73
  ## Evaluation Results
74
 
75
+ * **Test split:** Held-out 10% of the combined dataset (stratified across classes)
76
 
77
  | Metric | Value |
78
  | --------- | ------ |
 
93
  * `_unknown_` class may not cover all out-of-vocabulary words; false positives possible
94
  * Performance may degrade on dialects or languages underrepresented in training
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  ## Citation
97
 
98
  ```bibtex