Tr-Jp-LLM-1.5B-v2

This model is a fine-tuned version of SakanaAI/TinySwallow-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 3.0040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 1024
optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
3.9792	0.0492	500	3.6054
3.2678	0.0984	1000	3.0956
3.0036	0.1476	1500	3.0268
2.9727	0.1969	2000	3.0121
2.9638	0.2461	2500	3.0066
2.9605	0.2953	3000	3.0047
2.9584	0.3445	3500	3.0037
2.96	0.3937	4000	3.0041
2.9592	0.4429	4500	3.0040
2.9601	0.4921	5000	3.0040
2.9589	0.5414	5500	3.0039
2.96	0.5906	6000	3.0040
2.9584	0.6398	6500	3.0040
2.9609	0.6890	7000	3.0040
2.958	0.7382	7500	3.0039
2.9564	0.7874	8000	3.0040
2.9584	0.8366	8500	3.0039
2.9571	0.8859	9000	3.0039
2.9596	0.9351	9500	3.0039
2.9581	0.9843	10000	3.0040

Framework versions

Transformers 4.50.0
Pytorch 2.6.0+cu126
Datasets 3.4.1
Tokenizers 0.21.1

oriental-lab
/

Tr-Jp-LLM-1.5B-v2

Tr-Jp-LLM-1.5B-v2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for oriental-lab/Tr-Jp-LLM-1.5B-v2

Evaluation results