duo-predict-gpt2-medium-wikitext / README.md

Model save

8f4638b verified 2 months ago

4.41 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- bleu
	model-index:
	- name: duo-predict-gpt2-medium-wikitext
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# duo-predict-gpt2-medium-wikitext

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2546
	- Accuracy: 0.0073
	- Perplexity: 9.5311
	- Bleu: 1.0

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Perplexity \| Bleu \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------:\|:----------:\|:----:\|
	\| 7.6654 \| 0.1403 \| 500 \| 3.7315 \| 0.0073 \| 41.7396 \| 1.0 \|
	\| 7.0276 \| 0.2807 \| 1000 \| 3.4735 \| 0.0073 \| 32.2490 \| 1.0 \|
	\| 6.4629 \| 0.4210 \| 1500 \| 3.1863 \| 0.0073 \| 24.1987 \| 1.0 \|
	\| 5.9671 \| 0.5613 \| 2000 \| 2.9542 \| 0.0073 \| 19.1873 \| 1.0 \|
	\| 5.6969 \| 0.7017 \| 2500 \| 2.8233 \| 0.0073 \| 16.8331 \| 1.0 \|
	\| 5.5077 \| 0.8420 \| 3000 \| 2.7351 \| 0.0073 \| 15.4112 \| 1.0 \|
	\| 5.3536 \| 0.9823 \| 3500 \| 2.6607 \| 0.0073 \| 14.3059 \| 1.0 \|
	\| 5.2099 \| 1.1226 \| 4000 \| 2.6000 \| 0.0073 \| 13.4641 \| 1.0 \|
	\| 5.1158 \| 1.2630 \| 4500 \| 2.5493 \| 0.0073 \| 12.7980 \| 1.0 \|
	\| 5.0453 \| 1.4033 \| 5000 \| 2.5125 \| 0.0073 \| 12.3362 \| 1.0 \|
	\| 4.955 \| 1.5436 \| 5500 \| 2.4806 \| 0.0073 \| 11.9489 \| 1.0 \|
	\| 4.9157 \| 1.6840 \| 6000 \| 2.4537 \| 0.0073 \| 11.6310 \| 1.0 \|
	\| 4.8756 \| 1.8243 \| 6500 \| 2.4300 \| 0.0073 \| 11.3584 \| 1.0 \|
	\| 4.844 \| 1.9646 \| 7000 \| 2.4100 \| 0.0073 \| 11.1342 \| 1.0 \|
	\| 4.7136 \| 2.1050 \| 7500 \| 2.3948 \| 0.0073 \| 10.9657 \| 1.0 \|
	\| 4.6911 \| 2.2453 \| 8000 \| 2.3805 \| 0.0073 \| 10.8105 \| 1.0 \|
	\| 4.6741 \| 2.3856 \| 8500 \| 2.3668 \| 0.0073 \| 10.6637 \| 1.0 \|
	\| 4.6485 \| 2.5260 \| 9000 \| 2.3538 \| 0.0073 \| 10.5257 \| 1.0 \|
	\| 4.623 \| 2.6663 \| 9500 \| 2.3416 \| 0.0073 \| 10.3976 \| 1.0 \|
	\| 4.6016 \| 2.8066 \| 10000 \| 2.3303 \| 0.0073 \| 10.2806 \| 1.0 \|
	\| 4.5823 \| 2.9470 \| 10500 \| 2.3202 \| 0.0073 \| 10.1776 \| 1.0 \|
	\| 4.4802 \| 3.0873 \| 11000 \| 2.3143 \| 0.0073 \| 10.1182 \| 1.0 \|
	\| 4.4671 \| 3.2276 \| 11500 \| 2.3073 \| 0.0073 \| 10.0469 \| 1.0 \|
	\| 4.4557 \| 3.3679 \| 12000 \| 2.3006 \| 0.0073 \| 9.9800 \| 1.0 \|
	\| 4.4437 \| 3.5083 \| 12500 \| 2.2928 \| 0.0073 \| 9.9023 \| 1.0 \|
	\| 4.4402 \| 3.6486 \| 13000 \| 2.2862 \| 0.0073 \| 9.8375 \| 1.0 \|
	\| 4.4482 \| 3.7889 \| 13500 \| 2.2800 \| 0.0073 \| 9.7763 \| 1.0 \|
	\| 4.4279 \| 3.9293 \| 14000 \| 2.2752 \| 0.0073 \| 9.7303 \| 1.0 \|
	\| 4.3188 \| 4.0696 \| 14500 \| 2.2730 \| 0.0073 \| 9.7087 \| 1.0 \|
	\| 4.3193 \| 4.2099 \| 15000 \| 2.2691 \| 0.0073 \| 9.6704 \| 1.0 \|
	\| 4.3158 \| 4.3503 \| 15500 \| 2.2652 \| 0.0073 \| 9.6329 \| 1.0 \|
	\| 4.3196 \| 4.4906 \| 16000 \| 2.2619 \| 0.0073 \| 9.6012 \| 1.0 \|
	\| 4.2946 \| 4.6309 \| 16500 \| 2.2589 \| 0.0073 \| 9.5722 \| 1.0 \|
	\| 4.3078 \| 4.7713 \| 17000 \| 2.2564 \| 0.0073 \| 9.5487 \| 1.0 \|
	\| 4.2974 \| 4.9116 \| 17500 \| 2.2546 \| 0.0073 \| 9.5311 \| 1.0 \|


	### Framework versions

	- Transformers 4.49.0
	- Pytorch 2.6.0+cu124
	- Datasets 3.3.2
	- Tokenizers 0.21.0