d5049b8b-5144-4947-86e7-d224f36ac202

This model is a fine-tuned version of fxmarty/tiny-llama-fast-tokenizer on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000207
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0004	1	10.3799
10.3469	0.0204	50	10.3473
10.3332	0.0408	100	10.3371
10.3174	0.0612	150	10.3186
10.3102	0.0816	200	10.3093
10.3043	0.1020	250	10.3042
10.3027	0.1224	300	10.3016
10.3016	0.1429	350	10.3000
10.2983	0.1633	400	10.2991
10.305	0.1837	450	10.2988
10.3031	0.2041	500	10.2987