yalhessi
/

lemexp-processed-task1_min_symbols_lemma_command_small-deepseek-coder-1.3b-base

Generated from Trainer

Model card Files Files and versions Community

lemexp-processed-task1_min_symbols_lemma_command_small-deepseek-coder-1.3b-base / README.md

yalhessi's picture

End of training

a7758bd verified 3 months ago

|

history blame contribute delete

3.16 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-processed-task1_min_symbols_lemma_command_small-deepseek-coder-1.3b-base
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-processed-task1_min_symbols_lemma_command_small-deepseek-coder-1.3b-base

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4329

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 6
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:------:\|:---------------:\|
	\| 0.6364 \| 0.2000 \| 3683 \| 0.6357 \|
	\| 0.5857 \| 0.4001 \| 7366 \| 0.5827 \|
	\| 0.5682 \| 0.6001 \| 11049 \| 0.5516 \|
	\| 0.5421 \| 0.8001 \| 14732 \| 0.5293 \|
	\| 0.5142 \| 1.0002 \| 18415 \| 0.5177 \|
	\| 0.4674 \| 1.2002 \| 22098 \| 0.5015 \|
	\| 0.4615 \| 1.4002 \| 25781 \| 0.5000 \|
	\| 0.453 \| 1.6003 \| 29464 \| 0.4770 \|
	\| 0.4506 \| 1.8003 \| 33147 \| 0.4701 \|
	\| 0.4309 \| 2.0003 \| 36830 \| 0.4646 \|
	\| 0.3829 \| 2.2004 \| 40513 \| 0.4667 \|
	\| 0.3925 \| 2.4004 \| 44196 \| 0.4595 \|
	\| 0.3858 \| 2.6004 \| 47879 \| 0.4566 \|
	\| 0.3879 \| 2.8005 \| 51562 \| 0.4439 \|
	\| 0.3764 \| 3.0005 \| 55245 \| 0.4379 \|
	\| 0.3267 \| 3.2005 \| 58928 \| 0.4502 \|
	\| 0.3346 \| 3.4006 \| 62611 \| 0.4443 \|
	\| 0.3363 \| 3.6006 \| 66294 \| 0.4339 \|
	\| 0.3321 \| 3.8006 \| 69977 \| 0.4350 \|
	\| 0.3423 \| 4.0007 \| 73660 \| 0.4288 \|
	\| 0.2789 \| 4.2007 \| 77343 \| 0.4458 \|
	\| 0.2928 \| 4.4007 \| 81026 \| 0.4379 \|
	\| 0.2963 \| 4.6007 \| 84709 \| 0.4325 \|
	\| 0.2887 \| 4.8008 \| 88392 \| 0.4275 \|
	\| 0.2949 \| 5.0008 \| 92075 \| 0.4292 \|
	\| 0.2437 \| 5.2008 \| 95758 \| 0.4366 \|
	\| 0.2424 \| 5.4009 \| 99441 \| 0.4358 \|
	\| 0.2528 \| 5.6009 \| 103124 \| 0.4331 \|
	\| 0.2477 \| 5.8009 \| 106807 \| 0.4329 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0