Sin2pi
/

FAMOpimizer

Model card Files Files and versions Community

Sin2pi commited on 12 days ago

Commit

6cae1c4

·

verified ·

1 Parent(s): 7d3523e

Update README.md

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: mit
----

+---
+license: mit
+base_model:
+- openai/whisper-large-v3-turbo
+tags:
+- asr
+- optimizer
+- speech
+- audio
+- frequency
+---
+--Proof of concept--
+An experimental approach specifically designed for speech recognition tasks, FAM adapts momentum based on the frequency characteristics of gradient updates.
+### Frequency-Adaptive Momentum (FAM)
+#### Core Concept
+- Speech signals possess an inherent frequency structure, with different parts of the model responding to various frequency bands. This frequency structure remains preserved, albeit transformed, when converted to log-mel spectrograms, with model parameters adapting to capture this structure.
+- The Chain of Frequency Information: Original Audio → Log-Mel Spectrogram → Encoder Parameters → Gradient Updates.
+- Empirical observations reveal that transformer-based speech models develop:
+  - Lower encoder layers with filters responsive to specific frequency bands in the mel spectrogram.
+  - Attention heads tracking particular acoustic patterns over time.
+  - A hierarchical representation from acoustic features to phonetic units to words.
+- FAM aims to integrate a momentum scheme that adapts based on the "frequency signature" of gradient updates.
+#### Why This Optimizer Makes Sense
+FAM acknowledges the frequency structure within the optimization process itself, recognizing that:
+- **Gradient Frequencies Matter:** The Fourier transform of gradient updates reveals patterns linked to the model's current learning phase.
+- **Different Parameters Process Different Bands:** Similar to how our ears have frequency-specific receptors, different parts of the model specialize in various acoustic frequencies.
+- **Temporal Structure in Learning:** Speech learning progresses through stages - from basic acoustics to phonetic patterns to linguistic structures.
+By applying distinct momentum factors to different frequency bands in parameter space, FAM provides the optimizer with domain-specific audio information that it otherwise wouldn't have.
+download and test it for free! :D
+https://github.com/sine2pi/FAMOptimizer
+Usage example
+param_groups = get_parameter_groups(model=model, lr=0.001, weight_decay=1e-6)
+optimizer = FAMOptimizer(
+    params=param_groups,
+    beta=0.99,
+    n_bands=10,
+    fam_start_step=100,
+    layer_boost=True,
+    min_size=128,
+    debug=True,
+    weight_decay=0.0025,
+    lr=0.001,
+)
+scheduler = FAMScheduler2(
+    optimizer=optimizer,
+    warmup_steps=100,
+    total_steps=10000,
+    decay_start_step=100
+)