mahmoudmamdouh13 commited on
Commit
7ce9db8
·
verified ·
1 Parent(s): 249f6c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -1
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  # Audio Spectrogram Transformer (AST) Fine-Tuned on MLCommons Multilingual Spoken Words + Google Speech Commands
2
 
3
  ## Model Details
@@ -25,4 +26,74 @@
25
  "9": "cake",
26
  "10": "car",
27
  // ... up to 79: "zoo"
28
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ````markdown
2
  # Audio Spectrogram Transformer (AST) Fine-Tuned on MLCommons Multilingual Spoken Words + Google Speech Commands
3
 
4
  ## Model Details
 
26
  "9": "cake",
27
  "10": "car",
28
  // ... up to 79: "zoo"
29
+ }
30
+ ````
31
+
32
+ ## Training Data
33
+
34
+ * Total samples: \~XX,XXX utterances
35
+ * **Sources:**
36
+
37
+ * MLCommons Multilingual Spoken Words corpus (covering 40+ languages)
38
+ * Google Speech Commands v0.02 for silence and unknown categories
39
+ * **Preprocessing:**
40
+
41
+ * Resampling to 16 kHz
42
+ * Fixed-length one-second windows with zero-padding or cropping
43
+ * Data augmentation: time shift (±100 ms), additive background noise (SNR 10–20 dB)
44
+
45
+ ## Evaluation Results
46
+
47
+ * **Test split:** Held-out 20% of the combined dataset (stratified across classes)
48
+
49
+ | Metric | Value |
50
+ | --------- | ------ |
51
+ | Loss | 0.0685 |
52
+ | Precision | 0.9862 |
53
+ | Recall | 0.9862 |
54
+ | F1-score | 0.9861 |
55
+
56
+ ## Intended Uses and Limitations
57
+
58
+ * **Suitable for:**
59
+
60
+ * Real-time keyword spotting on-device
61
+ * Low-latency voice command detection in noisy environments
62
+ * **Limitations:**
63
+
64
+ * May misclassify under unseen noise conditions or heavy accents
65
+ * `_unknown_` class may not cover all out-of-vocabulary words; false positives possible
66
+ * Performance may degrade on dialects or languages underrepresented in training
67
+
68
+ ## Recommendations for Use
69
+
70
+ * **On-device deployment:** Convert to `safetensors` format to reduce size and improve loading speed
71
+ * **Runtime:** \~20M parameters; inference latency \~30 ms on mobile SoC
72
+ * **Performance tips:**
73
+
74
+ * Fine-tune threshold per class for high-recall vs. high-precision scenarios
75
+ * Use simple VAD front-end to suppress silent frames
76
+
77
+ ## Ethical Considerations and Bias
78
+
79
+ * Data covers several languages but is unbalanced: some languages underrepresented
80
+ * Potential for misrecognition in low-resource languages or non-standard accents
81
+ * Not intended for security-sensitive applications (e.g., authentication)
82
+
83
+ ## Citation
84
+
85
+ ```bibtex
86
+ @inproceedings{gong2021ast,
87
+ title={AST: Audio Spectrogram Transformer},
88
+ author={Gong, Yufei and Tian, Wei and Shen, Ding and Ermon, Stefano and Liu, Fei and Lazebnik, Svetlana},
89
+ booktitle={ICASSP},
90
+ year={2022}
91
+ }
92
+ ```
93
+
94
+ ---
95
+
96
+ *This model card was automatically generated.*
97
+
98
+ ```
99
+ ```