--- license: mit base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B tags: - trl - sft - generated_from_trainer library_name: peft model-index: - name: sft-lora-next-v3 results: [] --- # sft-lora-next-v3 This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.2299 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.000125 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: polynomial - lr_scheduler_warmup_steps: 5 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.6717 | 0.05 | 24 | 0.5240 | | 0.4498 | 0.1 | 48 | 0.3601 | | 0.3297 | 0.15 | 72 | 0.3050 | | 0.2929 | 0.19 | 96 | 0.2762 | | 0.275 | 0.24 | 120 | 0.2657 | | 0.2194 | 0.29 | 144 | 0.2583 | | 0.2564 | 0.34 | 168 | 0.2538 | | 0.2645 | 0.39 | 192 | 0.2483 | | 0.2489 | 0.44 | 216 | 0.2460 | | 0.2486 | 0.49 | 240 | 0.2436 | | 0.234 | 0.53 | 264 | 0.2410 | | 0.2741 | 0.58 | 288 | 0.2388 | | 0.2311 | 0.63 | 312 | 0.2371 | | 0.2119 | 0.68 | 336 | 0.2359 | | 0.2464 | 0.73 | 360 | 0.2344 | | 0.2384 | 0.78 | 384 | 0.2332 | | 0.2399 | 0.83 | 408 | 0.2319 | | 0.2384 | 0.87 | 432 | 0.2310 | | 0.2418 | 0.92 | 456 | 0.2303 | | 0.2189 | 0.97 | 480 | 0.2299 | ### Framework versions - PEFT 0.7.1 - Transformers 4.38.0 - Pytorch 2.1.1+cu121 - Datasets 2.19.1 - Tokenizers 0.15.2