lemexp-task1-v2-template_small_nodefs_old_defs-deepseek-coder-1.3b-base-ddp-8lr-v2

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1551

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.438 0.2002 721 0.3096
0.3098 0.4003 1442 0.2840
0.2714 0.6005 2163 0.2634
0.2619 0.8007 2884 0.2561
0.2529 1.0008 3605 0.2385
0.2363 1.2010 4326 0.2378
0.2334 1.4012 5047 0.2336
0.2275 1.6013 5768 0.2318
0.2263 1.8015 6489 0.2268
0.223 2.0017 7210 0.2194
0.2133 2.2018 7931 0.2129
0.2104 2.4020 8652 0.2150
0.2073 2.6022 9373 0.2089
0.206 2.8023 10094 0.2061
0.2045 3.0025 10815 0.2018
0.1949 3.2027 11536 0.1990
0.1919 3.4028 12257 0.2000
0.1917 3.6030 12978 0.1974
0.1893 3.8032 13699 0.1960
0.189 4.0033 14420 0.1947
0.1783 4.2035 15141 0.1881
0.1759 4.4037 15862 0.1905
0.1767 4.6038 16583 0.1871
0.1761 4.8040 17304 0.1867
0.1757 5.0042 18025 0.1866
0.1631 5.2043 18746 0.1840
0.1642 5.4045 19467 0.1840
0.1629 5.6047 20188 0.1791
0.1626 5.8048 20909 0.1781
0.1621 6.0050 21630 0.1761
0.1535 6.2052 22351 0.1774
0.1506 6.4053 23072 0.1769
0.1507 6.6055 23793 0.1700
0.1507 6.8057 24514 0.1722
0.1494 7.0058 25235 0.1688
0.141 7.2060 25956 0.1671
0.1404 7.4062 26677 0.1681
0.1388 7.6063 27398 0.1657
0.1368 7.8065 28119 0.1629
0.1365 8.0067 28840 0.1610
0.1238 8.2068 29561 0.1599
0.1253 8.4070 30282 0.1577
0.1253 8.6072 31003 0.1566
0.127 8.8073 31724 0.1567
0.124 9.0075 32445 0.1571
0.1119 9.2077 33166 0.1584
0.1113 9.4078 33887 0.1570
0.1125 9.6080 34608 0.1525
0.1121 9.8082 35329 0.1563
0.1121 10.0083 36050 0.1559
0.099 10.2085 36771 0.1581
0.0986 10.4087 37492 0.1541
0.0998 10.6088 38213 0.1531
0.0992 10.8090 38934 0.1530
0.0981 11.0092 39655 0.1546
0.0909 11.2093 40376 0.1566
0.0887 11.4095 41097 0.1568
0.0895 11.6097 41818 0.1546
0.0887 11.8098 42539 0.1551

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
104
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-v2-template_small_nodefs_old_defs-deepseek-coder-1.3b-base-ddp-8lr-v2

Adapter
(156)
this model