lemexp-task1-v2-template_small_nodefs-deepseek-coder-1.3b-base-8lr-24epochs-eos-token

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1674

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 24
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.2826 0.4001 1440 0.2698
0.2407 0.8002 2880 0.2409
0.2197 1.2003 4320 0.2314
0.2152 1.6004 5760 0.2203
0.2129 2.0006 7200 0.2166
0.2027 2.4007 8640 0.2152
0.1986 2.8008 10080 0.2055
0.1905 3.2009 11520 0.2005
0.1882 3.6010 12960 0.1980
0.1888 4.0011 14400 0.1956
0.1824 4.4012 15840 0.2021
0.1827 4.8013 17280 0.1901
0.1722 5.2014 18720 0.1974
0.174 5.6016 20160 0.1949
0.1716 6.0017 21600 0.1850
0.1689 6.4018 23040 0.1929
0.1685 6.8019 24480 0.1866
0.1611 7.2020 25920 0.1817
0.1613 7.6021 27360 0.1794
0.1632 8.0022 28800 0.1795
0.1528 8.4023 30240 0.1793
0.1538 8.8024 31680 0.1764
0.1472 9.2026 33120 0.1739
0.149 9.6027 34560 0.1703
0.146 10.0028 36000 0.1699
0.1413 10.4029 37440 0.1689
0.141 10.8030 38880 0.1663
0.1331 11.2031 40320 0.1659
0.135 11.6032 41760 0.1628
0.137 12.0033 43200 0.1596
0.1289 12.4034 44640 0.1606
0.1322 12.8036 46080 0.1596
0.1202 13.2037 47520 0.1592
0.1224 13.6038 48960 0.1595
0.1244 14.0039 50400 0.1593
0.1168 14.4040 51840 0.1542
0.1181 14.8041 53280 0.1553
0.1084 15.2042 54720 0.1624
0.1105 15.6043 56160 0.1536
0.1128 16.0044 57600 0.1571
0.1055 16.4046 59040 0.1548
0.1071 16.8047 60480 0.1548
0.0992 17.2048 61920 0.1562
0.0991 17.6049 63360 0.1515
0.1 18.0050 64800 0.1526
0.0915 18.4051 66240 0.1568
0.0947 18.8052 67680 0.1533
0.0854 19.2053 69120 0.1563
0.0874 19.6054 70560 0.1508
0.0872 20.0056 72000 0.1539
0.0813 20.4057 73440 0.1537
0.083 20.8058 74880 0.1548
0.0771 21.2059 76320 0.1605
0.0766 21.6060 77760 0.1554
0.0776 22.0061 79200 0.1590
0.0707 22.4062 80640 0.1649
0.0719 22.8063 82080 0.1631
0.0673 23.2064 83520 0.1689
0.0677 23.6066 84960 0.1674

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-v2-template_small_nodefs-deepseek-coder-1.3b-base-8lr-24epochs-eos-token

Adapter
(184)
this model