lemexp-task1-min_symbols_template_full-deepseek-coder-1.3b-base-ddp-12lr

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1895

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0012
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.387 0.2000 2907 0.3682
0.3641 0.4001 5814 0.3537
0.3563 0.6001 8721 0.3494
0.3467 0.8001 11628 0.3382
0.3373 1.0001 14535 0.3465
0.3347 1.2002 17442 0.3315
0.3396 1.4002 20349 0.3322
0.3317 1.6002 23256 0.3243
0.3245 1.8002 26163 0.3237
0.3219 2.0003 29070 0.3167
0.3166 2.2003 31977 0.3250
0.3179 2.4003 34884 0.3165
0.3142 2.6004 37791 0.3107
0.3102 2.8004 40698 0.3038
0.3094 3.0004 43605 0.3081
0.3039 3.2004 46512 0.2971
0.3023 3.4005 49419 0.2910
0.297 3.6005 52326 0.2912
0.2947 3.8005 55233 0.2958
0.2925 4.0006 58140 0.2862
0.2848 4.2006 61047 0.2842
0.2842 4.4006 63954 0.2797
0.2815 4.6006 66861 0.2815
0.2795 4.8007 69768 0.2784
0.2735 5.0007 72675 0.2715
0.2685 5.2007 75582 0.2766
0.2698 5.4007 78489 0.2687
0.268 5.6008 81396 0.2648
0.2623 5.8008 84303 0.2647
0.2613 6.0008 87210 0.2587
0.2568 6.2009 90117 0.2573
0.2544 6.4009 93024 0.2553
0.2519 6.6009 95931 0.2518
0.2514 6.8009 98838 0.2525
0.2519 7.0010 101745 0.2512
0.239 7.2010 104652 0.2441
0.2407 7.4010 107559 0.2456
0.2397 7.6010 110466 0.2433
0.2342 7.8011 113373 0.2364
0.2329 8.0011 116280 0.2316
0.2232 8.2011 119187 0.2307
0.2213 8.4012 122094 0.2294
0.2223 8.6012 125001 0.2230
0.2199 8.8012 127908 0.2216
0.2174 9.0012 130815 0.2211
0.208 9.2013 133722 0.2193
0.2089 9.4013 136629 0.2162
0.2047 9.6013 139536 0.2120
0.2028 9.8013 142443 0.2098
0.2005 10.0014 145350 0.2064
0.1923 10.2014 148257 0.2040
0.1926 10.4014 151164 0.2023
0.1887 10.6015 154071 0.2010
0.1883 10.8015 156978 0.2004
0.1865 11.0015 159885 0.1963
0.1769 11.2015 162792 0.1954
0.1744 11.4016 165699 0.1928
0.1721 11.6016 168606 0.1914
0.1711 11.8016 171513 0.1895

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-min_symbols_template_full-deepseek-coder-1.3b-base-ddp-12lr

Adapter
(170)
this model