End of training

3c46d0a verified 3 months ago

4.82 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task1-lemma_command_full-deepseek-coder-1.3b-base-ddp-8lr
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-lemma_command_full-deepseek-coder-1.3b-base-ddp-8lr

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4320

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|
	\| 0.7149 \| 0.2 \| 2121 \| 0.6754 \|
	\| 0.684 \| 0.4 \| 4242 \| 0.6408 \|
	\| 0.6664 \| 0.6 \| 6363 \| 0.6271 \|
	\| 0.6568 \| 0.8 \| 8484 \| 0.6106 \|
	\| 0.6445 \| 1.0 \| 10605 \| 0.6016 \|
	\| 0.6188 \| 1.2 \| 12726 \| 0.5927 \|
	\| 0.618 \| 1.4 \| 14847 \| 0.5864 \|
	\| 0.6161 \| 1.6 \| 16968 \| 0.5864 \|
	\| 0.6239 \| 1.8 \| 19089 \| 0.5779 \|
	\| 0.6126 \| 2.0 \| 21210 \| 0.5764 \|
	\| 0.5882 \| 2.2 \| 23331 \| 0.5680 \|
	\| 0.5955 \| 2.4 \| 25452 \| 0.5647 \|
	\| 0.5858 \| 2.6 \| 27573 \| 0.5615 \|
	\| 0.5882 \| 2.8 \| 29694 \| 0.5574 \|
	\| 0.5819 \| 3.0 \| 31815 \| 0.5504 \|
	\| 0.5759 \| 3.2 \| 33936 \| 0.5544 \|
	\| 0.5647 \| 3.4 \| 36057 \| 0.5479 \|
	\| 0.5687 \| 3.6 \| 38178 \| 0.5458 \|
	\| 0.5692 \| 3.8 \| 40299 \| 0.5415 \|
	\| 0.5633 \| 4.0 \| 42420 \| 0.5398 \|
	\| 0.5489 \| 4.2 \| 44541 \| 0.5299 \|
	\| 0.5482 \| 4.4 \| 46662 \| 0.5246 \|
	\| 0.5443 \| 4.6 \| 48783 \| 0.5246 \|
	\| 0.5466 \| 4.8 \| 50904 \| 0.5225 \|
	\| 0.5464 \| 5.0 \| 53025 \| 0.5157 \|
	\| 0.5249 \| 5.2 \| 55146 \| 0.5203 \|
	\| 0.5323 \| 5.4 \| 57267 \| 0.5115 \|
	\| 0.5227 \| 5.6 \| 59388 \| 0.5075 \|
	\| 0.5277 \| 5.8 \| 61509 \| 0.5074 \|
	\| 0.5214 \| 6.0 \| 63630 \| 0.5040 \|
	\| 0.5115 \| 6.2 \| 65751 \| 0.4969 \|
	\| 0.5088 \| 6.4 \| 67872 \| 0.4950 \|
	\| 0.511 \| 6.6 \| 69993 \| 0.4912 \|
	\| 0.5097 \| 6.8 \| 72114 \| 0.4892 \|
	\| 0.5024 \| 7.0 \| 74235 \| 0.4877 \|
	\| 0.4842 \| 7.2 \| 76356 \| 0.4860 \|
	\| 0.484 \| 7.4 \| 78477 \| 0.4832 \|
	\| 0.493 \| 7.6 \| 80598 \| 0.4816 \|
	\| 0.4863 \| 7.8 \| 82719 \| 0.4759 \|
	\| 0.4878 \| 8.0 \| 84840 \| 0.4672 \|
	\| 0.4644 \| 8.2 \| 86961 \| 0.4705 \|
	\| 0.4648 \| 8.4 \| 89082 \| 0.4654 \|
	\| 0.4663 \| 8.6 \| 91203 \| 0.4612 \|
	\| 0.4715 \| 8.8 \| 93324 \| 0.4636 \|
	\| 0.4669 \| 9.0 \| 95445 \| 0.4591 \|
	\| 0.4451 \| 9.2 \| 97566 \| 0.4586 \|
	\| 0.4457 \| 9.4 \| 99687 \| 0.4580 \|
	\| 0.4538 \| 9.6 \| 101808 \| 0.4495 \|
	\| 0.4489 \| 9.8 \| 103929 \| 0.4492 \|
	\| 0.4466 \| 10.0 \| 106050 \| 0.4458 \|
	\| 0.4252 \| 10.2 \| 108171 \| 0.4470 \|
	\| 0.4226 \| 10.4 \| 110292 \| 0.4456 \|
	\| 0.4244 \| 10.6 \| 112413 \| 0.4402 \|
	\| 0.4226 \| 10.8 \| 114534 \| 0.4374 \|
	\| 0.4203 \| 11.0 \| 116655 \| 0.4352 \|
	\| 0.4124 \| 11.2 \| 118776 \| 0.4361 \|
	\| 0.4039 \| 11.4 \| 120897 \| 0.4340 \|
	\| 0.405 \| 11.6 \| 123018 \| 0.4321 \|
	\| 0.4083 \| 11.8 \| 125139 \| 0.4314 \|
	\| 0.4025 \| 12.0 \| 127260 \| 0.4320 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task1-lemma_command_full-deepseek-coder-1.3b-base-ddp-8lr
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-lemma_command_full-deepseek-coder-1.3b-base-ddp-8lr

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4320

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|
	\| 0.7149 \| 0.2 \| 2121 \| 0.6754 \|
	\| 0.684 \| 0.4 \| 4242 \| 0.6408 \|
	\| 0.6664 \| 0.6 \| 6363 \| 0.6271 \|
	\| 0.6568 \| 0.8 \| 8484 \| 0.6106 \|
	\| 0.6445 \| 1.0 \| 10605 \| 0.6016 \|
	\| 0.6188 \| 1.2 \| 12726 \| 0.5927 \|
	\| 0.618 \| 1.4 \| 14847 \| 0.5864 \|
	\| 0.6161 \| 1.6 \| 16968 \| 0.5864 \|
	\| 0.6239 \| 1.8 \| 19089 \| 0.5779 \|
	\| 0.6126 \| 2.0 \| 21210 \| 0.5764 \|
	\| 0.5882 \| 2.2 \| 23331 \| 0.5680 \|
	\| 0.5955 \| 2.4 \| 25452 \| 0.5647 \|
	\| 0.5858 \| 2.6 \| 27573 \| 0.5615 \|
	\| 0.5882 \| 2.8 \| 29694 \| 0.5574 \|
	\| 0.5819 \| 3.0 \| 31815 \| 0.5504 \|
	\| 0.5759 \| 3.2 \| 33936 \| 0.5544 \|
	\| 0.5647 \| 3.4 \| 36057 \| 0.5479 \|
	\| 0.5687 \| 3.6 \| 38178 \| 0.5458 \|
	\| 0.5692 \| 3.8 \| 40299 \| 0.5415 \|
	\| 0.5633 \| 4.0 \| 42420 \| 0.5398 \|
	\| 0.5489 \| 4.2 \| 44541 \| 0.5299 \|
	\| 0.5482 \| 4.4 \| 46662 \| 0.5246 \|
	\| 0.5443 \| 4.6 \| 48783 \| 0.5246 \|
	\| 0.5466 \| 4.8 \| 50904 \| 0.5225 \|
	\| 0.5464 \| 5.0 \| 53025 \| 0.5157 \|
	\| 0.5249 \| 5.2 \| 55146 \| 0.5203 \|
	\| 0.5323 \| 5.4 \| 57267 \| 0.5115 \|
	\| 0.5227 \| 5.6 \| 59388 \| 0.5075 \|
	\| 0.5277 \| 5.8 \| 61509 \| 0.5074 \|
	\| 0.5214 \| 6.0 \| 63630 \| 0.5040 \|
	\| 0.5115 \| 6.2 \| 65751 \| 0.4969 \|
	\| 0.5088 \| 6.4 \| 67872 \| 0.4950 \|
	\| 0.511 \| 6.6 \| 69993 \| 0.4912 \|
	\| 0.5097 \| 6.8 \| 72114 \| 0.4892 \|
	\| 0.5024 \| 7.0 \| 74235 \| 0.4877 \|
	\| 0.4842 \| 7.2 \| 76356 \| 0.4860 \|
	\| 0.484 \| 7.4 \| 78477 \| 0.4832 \|
	\| 0.493 \| 7.6 \| 80598 \| 0.4816 \|
	\| 0.4863 \| 7.8 \| 82719 \| 0.4759 \|
	\| 0.4878 \| 8.0 \| 84840 \| 0.4672 \|
	\| 0.4644 \| 8.2 \| 86961 \| 0.4705 \|
	\| 0.4648 \| 8.4 \| 89082 \| 0.4654 \|
	\| 0.4663 \| 8.6 \| 91203 \| 0.4612 \|
	\| 0.4715 \| 8.8 \| 93324 \| 0.4636 \|
	\| 0.4669 \| 9.0 \| 95445 \| 0.4591 \|
	\| 0.4451 \| 9.2 \| 97566 \| 0.4586 \|
	\| 0.4457 \| 9.4 \| 99687 \| 0.4580 \|
	\| 0.4538 \| 9.6 \| 101808 \| 0.4495 \|
	\| 0.4489 \| 9.8 \| 103929 \| 0.4492 \|
	\| 0.4466 \| 10.0 \| 106050 \| 0.4458 \|
	\| 0.4252 \| 10.2 \| 108171 \| 0.4470 \|
	\| 0.4226 \| 10.4 \| 110292 \| 0.4456 \|
	\| 0.4244 \| 10.6 \| 112413 \| 0.4402 \|
	\| 0.4226 \| 10.8 \| 114534 \| 0.4374 \|
	\| 0.4203 \| 11.0 \| 116655 \| 0.4352 \|
	\| 0.4124 \| 11.2 \| 118776 \| 0.4361 \|
	\| 0.4039 \| 11.4 \| 120897 \| 0.4340 \|
	\| 0.405 \| 11.6 \| 123018 \| 0.4321 \|
	\| 0.4083 \| 11.8 \| 125139 \| 0.4314 \|
	\| 0.4025 \| 12.0 \| 127260 \| 0.4320 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0