OpenMOSE commited on
Commit
b77926f
·
1 Parent(s): 77abc16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -44,7 +44,7 @@ The training process consisted of three distinct stages:
44
  - Only the attention mechanism was trained; all other components (MLP layers, embeddings, heads) were frozen
45
 
46
  ### Stage 3: Supervised Fine-Tuning (Using RWKV-LM-RLHF)
47
- - Utilized a distillation dataset of 900K samples
48
  - Smoothed Loss for faster convergence
49
  - Implemented Variable Rank PEFT to enhance training efficiency
50
  - Bone(Block Affine Transformation) r=512+
 
44
  - Only the attention mechanism was trained; all other components (MLP layers, embeddings, heads) were frozen
45
 
46
  ### Stage 3: Supervised Fine-Tuning (Using RWKV-LM-RLHF)
47
+ - Utilized a distillation dataset of 900K samples (Chinese,Japanese,English)
48
  - Smoothed Loss for faster convergence
49
  - Implemented Variable Rank PEFT to enhance training efficiency
50
  - Bone(Block Affine Transformation) r=512+