CocoRoF
/

KoModernBERT-base-mlm-v04-retry-model-chp07

Generated from Trainer

Model card Files Files and versions Community

CocoRoF commited on Feb 8

Commit

ba0e57c

·

verified ·

1 Parent(s): 6382050

cc-100-pro-16-18 Done

Files changed (1) hide show

README.md +9 -6

README.md CHANGED Viewed

@@ -40,8 +40,8 @@ The following hyperparameters were used during training:
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 8
-- gradient_accumulation_steps: 32
-- total_train_batch_size: 4096
 - total_eval_batch_size: 64
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
@@ -49,10 +49,13 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 50.521        | 0.3616 | 2500 | 1.5967          |
-| 0.0           | 0.7233 | 5000 | nan             |
 ### Framework versions

 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 8
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 2048
 - total_eval_batch_size: 64
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 ### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 25.4012       | 0.1808 | 2500  | 1.6019          |
+| 25.151        | 0.3616 | 5000  | 1.6015          |
+| 25.0652       | 0.5424 | 7500  | 1.6032          |
+| 24.9933       | 0.7233 | 10000 | 1.5984          |
+| 0.0           | 0.9041 | 12500 | nan             |
 ### Framework versions