Visualize in Weights & Biases

d1

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1395
  • Rewards/chosen: -0.1261
  • Rewards/rejected: -19.6585
  • Rewards/accuracies: 0.9737
  • Rewards/margins: 19.5324
  • Logps/rejected: -369.5529
  • Logps/chosen: -169.7162
  • Logits/rejected: -9.2987
  • Logits/chosen: -8.0855

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 4
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0841 1.7149 100 0.0542 0.8716 -12.2566 0.9737 13.1282 -295.5336 -159.7391 -16.8791 -17.5293
0.0013 3.4298 200 0.1395 -0.1261 -19.6585 0.9737 19.5324 -369.5529 -169.7162 -9.2987 -8.0855

Framework versions

  • Transformers 4.43.0.dev0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
131
Safetensors
Model size
1.35B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for stojchet/d1

Finetuned
(101)
this model
Finetunes
1 model