lemexp-task1-template_small-deepseek-coder-1.3b-base-ddp-8lr-1bs

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1927

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 8
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.3957 0.2001 1258 0.3826
0.3477 0.4002 2516 0.3397
0.3363 0.6003 3774 0.3289
0.3209 0.8004 5032 0.3253
0.3156 1.0005 6290 0.3069
0.3004 1.2006 7548 0.3068
0.3008 1.4007 8806 0.2984
0.2906 1.6008 10064 0.2998
0.2945 1.8009 11322 0.2932
0.2894 2.0010 12580 0.2823
0.2752 2.2010 13838 0.2808
0.2718 2.4011 15096 0.2829
0.2731 2.6012 16354 0.2806
0.2692 2.8013 17612 0.2707
0.2644 3.0014 18870 0.2728
0.2543 3.2015 20128 0.2660
0.2575 3.4016 21386 0.2628
0.2563 3.6017 22644 0.2645
0.2522 3.8018 23902 0.2550
0.2519 4.0019 25160 0.2550
0.2397 4.2020 26418 0.2544
0.2419 4.4021 27676 0.2483
0.2358 4.6022 28934 0.2477
0.2349 4.8023 30192 0.2466
0.234 5.0024 31450 0.2442
0.2212 5.2025 32708 0.2443
0.2221 5.4026 33966 0.2420
0.222 5.6027 35224 0.2322
0.2198 5.8028 36482 0.2319
0.2193 6.0029 37740 0.2315
0.2051 6.2030 38998 0.2245
0.2071 6.4031 40256 0.2249
0.2039 6.6031 41514 0.2309
0.2059 6.8032 42772 0.2184
0.2044 7.0033 44030 0.2175
0.1878 7.2034 45288 0.2172
0.1903 7.4035 46546 0.2123
0.1924 7.6036 47804 0.2105
0.1886 7.8037 49062 0.2087
0.1876 8.0038 50320 0.2063
0.1726 8.2039 51578 0.2109
0.1756 8.4040 52836 0.2097
0.1764 8.6041 54094 0.2045
0.1737 8.8042 55352 0.1993
0.1702 9.0043 56610 0.2031
0.1561 9.2044 57868 0.1991
0.158 9.4045 59126 0.1977
0.1568 9.6046 60384 0.1983
0.1583 9.8047 61642 0.1965
0.1591 10.0048 62900 0.1940
0.1419 10.2049 64158 0.1956
0.1434 10.4050 65416 0.1924
0.1411 10.6051 66674 0.1940
0.1418 10.8052 67932 0.1929
0.1393 11.0052 69190 0.1922
0.1279 11.2053 70448 0.1946
0.1287 11.4054 71706 0.1953
0.1274 11.6055 72964 0.1948
0.1259 11.8056 74222 0.1927

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-template_small-deepseek-coder-1.3b-base-ddp-8lr-1bs

Adapter
(158)
this model