lemexp-task1-v2-template_small_old_defs-deepseek-coder-1.3b-base-ddp-8lr-v2

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1555

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4029 0.2002 721 0.2970
0.2985 0.4003 1442 0.2698
0.263 0.6005 2163 0.2522
0.2565 0.8007 2884 0.2486
0.246 1.0008 3605 0.2322
0.2322 1.2010 4326 0.2360
0.2318 1.4012 5047 0.2286
0.223 1.6013 5768 0.2263
0.2213 1.8015 6489 0.2191
0.2189 2.0017 7210 0.2177
0.2096 2.2018 7931 0.2160
0.2084 2.4020 8652 0.2158
0.2045 2.6022 9373 0.2075
0.2026 2.8023 10094 0.2082
0.2012 3.0025 10815 0.2029
0.19 3.2027 11536 0.2002
0.1896 3.4028 12257 0.2012
0.1879 3.6030 12978 0.1948
0.1868 3.8032 13699 0.1899
0.1857 4.0033 14420 0.1936
0.175 4.2035 15141 0.1864
0.174 4.4037 15862 0.1912
0.1741 4.6038 16583 0.1900
0.175 4.8040 17304 0.1854
0.1738 5.0042 18025 0.1858
0.161 5.2043 18746 0.1868
0.1614 5.4045 19467 0.1789
0.1609 5.6047 20188 0.1815
0.1617 5.8048 20909 0.1748
0.16 6.0050 21630 0.1753
0.1508 6.2052 22351 0.1744
0.1486 6.4053 23072 0.1707
0.1488 6.6055 23793 0.1719
0.1488 6.8057 24514 0.1674
0.1473 7.0058 25235 0.1655
0.14 7.2060 25956 0.1636
0.1371 7.4062 26677 0.1638
0.137 7.6063 27398 0.1641
0.1352 7.8065 28119 0.1606
0.1356 8.0067 28840 0.1570
0.1223 8.2068 29561 0.1599
0.1243 8.4070 30282 0.1601
0.1239 8.6072 31003 0.1563
0.1242 8.8073 31724 0.1527
0.1226 9.0075 32445 0.1568
0.1101 9.2077 33166 0.1569
0.1099 9.4078 33887 0.1526
0.1105 9.6080 34608 0.1526
0.1106 9.8082 35329 0.1538
0.1106 10.0083 36050 0.1551
0.098 10.2085 36771 0.1594
0.0969 10.4087 37492 0.1530
0.0976 10.6088 38213 0.1547
0.0977 10.8090 38934 0.1526
0.0969 11.0092 39655 0.1564
0.0893 11.2093 40376 0.1561
0.087 11.4095 41097 0.1561
0.0883 11.6097 41818 0.1558
0.0877 11.8098 42539 0.1555

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
149
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-v2-template_small_old_defs-deepseek-coder-1.3b-base-ddp-8lr-v2

Adapter
(158)
this model