Sin2pi commited on
Commit
6cae1c4
·
verified ·
1 Parent(s): 7d3523e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - openai/whisper-large-v3-turbo
5
+ tags:
6
+ - asr
7
+ - optimizer
8
+ - speech
9
+ - audio
10
+ - frequency
11
+ ---
12
+
13
+ --Proof of concept--
14
+
15
+ An experimental approach specifically designed for speech recognition tasks, FAM adapts momentum based on the frequency characteristics of gradient updates.
16
+
17
+ ### Frequency-Adaptive Momentum (FAM)
18
+
19
+ #### Core Concept
20
+
21
+ - Speech signals possess an inherent frequency structure, with different parts of the model responding to various frequency bands. This frequency structure remains preserved, albeit transformed, when converted to log-mel spectrograms, with model parameters adapting to capture this structure.
22
+ - The Chain of Frequency Information: Original Audio → Log-Mel Spectrogram → Encoder Parameters → Gradient Updates.
23
+ - Empirical observations reveal that transformer-based speech models develop:
24
+ - Lower encoder layers with filters responsive to specific frequency bands in the mel spectrogram.
25
+ - Attention heads tracking particular acoustic patterns over time.
26
+ - A hierarchical representation from acoustic features to phonetic units to words.
27
+ - FAM aims to integrate a momentum scheme that adapts based on the "frequency signature" of gradient updates.
28
+
29
+ #### Why This Optimizer Makes Sense
30
+
31
+ FAM acknowledges the frequency structure within the optimization process itself, recognizing that:
32
+ - **Gradient Frequencies Matter:** The Fourier transform of gradient updates reveals patterns linked to the model's current learning phase.
33
+ - **Different Parameters Process Different Bands:** Similar to how our ears have frequency-specific receptors, different parts of the model specialize in various acoustic frequencies.
34
+ - **Temporal Structure in Learning:** Speech learning progresses through stages - from basic acoustics to phonetic patterns to linguistic structures.
35
+
36
+ By applying distinct momentum factors to different frequency bands in parameter space, FAM provides the optimizer with domain-specific audio information that it otherwise wouldn't have.
37
+
38
+ download and test it for free! :D
39
+
40
+ https://github.com/sine2pi/FAMOptimizer
41
+
42
+ Usage example
43
+
44
+ param_groups = get_parameter_groups(model=model, lr=0.001, weight_decay=1e-6)
45
+
46
+ optimizer = FAMOptimizer(
47
+ params=param_groups,
48
+ beta=0.99,
49
+ n_bands=10,
50
+ fam_start_step=100,
51
+ layer_boost=True,
52
+ min_size=128,
53
+ debug=True,
54
+ weight_decay=0.0025,
55
+ lr=0.001,
56
+ )
57
+
58
+ scheduler = FAMScheduler2(
59
+ optimizer=optimizer,
60
+ warmup_steps=100,
61
+ total_steps=10000,
62
+ decay_start_step=100
63
+ )