lemexp-task1-v2-template_small_notypes-deepseek-coder-1.3b-base-8lr-24epochs-eos

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1882

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 24
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.3243 0.4001 1440 0.2947
0.2737 0.8002 2880 0.2755
0.2463 1.2003 4320 0.2620
0.2389 1.6004 5760 0.2570
0.2342 2.0006 7200 0.2518
0.2237 2.4007 8640 0.2492
0.2216 2.8008 10080 0.2380
0.2104 3.2009 11520 0.2318
0.2076 3.6010 12960 0.2281
0.2087 4.0011 14400 0.2299
0.2011 4.4012 15840 0.2211
0.199 4.8013 17280 0.2192
0.1893 5.2014 18720 0.2117
0.1935 5.6016 20160 0.2185
0.1895 6.0017 21600 0.2080
0.1841 6.4018 23040 0.2015
0.184 6.8019 24480 0.2029
0.1755 7.2020 25920 0.2010
0.1762 7.6021 27360 0.2034
0.1775 8.0022 28800 0.1958
0.1666 8.4023 30240 0.1979
0.1673 8.8024 31680 0.2012
0.1606 9.2026 33120 0.1933
0.1629 9.6027 34560 0.1895
0.1596 10.0028 36000 0.1885
0.1526 10.4029 37440 0.1883
0.1545 10.8030 38880 0.1863
0.1454 11.2031 40320 0.1857
0.146 11.6032 41760 0.1823
0.1491 12.0033 43200 0.1791
0.1395 12.4034 44640 0.1829
0.142 12.8036 46080 0.1781
0.131 13.2037 47520 0.1792
0.1323 13.6038 48960 0.1823
0.1339 14.0039 50400 0.1795
0.1261 14.4040 51840 0.1737
0.1279 14.8041 53280 0.1788
0.1168 15.2042 54720 0.1754
0.1199 15.6043 56160 0.1740
0.121 16.0044 57600 0.1753
0.1125 16.4046 59040 0.1723
0.1151 16.8047 60480 0.1719
0.1053 17.2048 61920 0.1718
0.106 17.6049 63360 0.1691
0.1073 18.0050 64800 0.1723
0.0973 18.4051 66240 0.1691
0.1009 18.8052 67680 0.1661
0.0908 19.2053 69120 0.1781
0.0926 19.6054 70560 0.1742
0.093 20.0056 72000 0.1732
0.086 20.4057 73440 0.1750
0.0881 20.8058 74880 0.1754
0.0818 21.2059 76320 0.1779
0.0809 21.6060 77760 0.1774
0.0815 22.0061 79200 0.1790
0.0746 22.4062 80640 0.1856
0.0752 22.8063 82080 0.1831
0.0706 23.2064 83520 0.1869
0.071 23.6066 84960 0.1882

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
350
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-v2-template_small_notypes-deepseek-coder-1.3b-base-8lr-24epochs-eos

Adapter
(185)
this model