3ab90c26-af06-4d64-907a-48c62f527d25

This model is a fine-tuned version of fxmarty/tiny-llama-fast-tokenizer on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000214
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0004	1	10.3799
10.3484	0.0204	50	10.3479
10.3315	0.0408	100	10.3346
10.3144	0.0612	150	10.3161
10.3101	0.0816	200	10.3086
10.3036	0.1020	250	10.3041
10.3032	0.1224	300	10.3017
10.3024	0.1429	350	10.3001
10.2981	0.1633	400	10.2992
10.3051	0.1837	450	10.2989
10.303	0.2041	500	10.2988