RylanSchaeffer's picture
End of training
ea5a1e6 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter2_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4538
  • Num Input Tokens Seen: 4832464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6784 0.0591 5 1.2633 282096
1.3537 0.1183 10 1.1871 571576
1.0696 0.1774 15 1.2164 857160
0.9162 0.2365 20 1.2391 1142344
0.7598 0.2956 25 1.3479 1427536
0.5372 0.3548 30 1.4227 1715736
0.4796 0.4139 35 1.4737 2003760
0.3889 0.4730 40 1.5021 2286384
0.1994 0.5322 45 1.5032 2573248
0.3391 0.5913 50 1.4714 2862104
0.3297 0.6504 55 1.4358 3145472
0.2038 0.7095 60 1.4488 3432144
0.195 0.7687 65 1.4273 3724448
0.1749 0.8278 70 1.4248 4016736
0.1654 0.8869 75 1.4554 4305224
0.1846 0.9460 80 1.4274 4595952

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1