0e5c3536-473b-4960-b73b-901afa925c1e

This model is a fine-tuned version of fxmarty/tiny-llama-fast-tokenizer on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000201
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0004	1	10.3799
10.3491	0.0204	50	10.3486
10.3359	0.0408	100	10.3396
10.3185	0.0612	150	10.3202
10.3124	0.0816	200	10.3110
10.3068	0.1020	250	10.3062
10.3042	0.1224	300	10.3034
10.3032	0.1429	350	10.3018
10.2994	0.1633	400	10.3008
10.3075	0.1837	450	10.3006
10.3043	0.2041	500	10.3005