Tr-Jp-LLM-1.5B-v2

This model is a fine-tuned version of SakanaAI/TinySwallow-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 1024
  • optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
3.9792 0.0492 500 3.6054
3.2678 0.0984 1000 3.0956
3.0036 0.1476 1500 3.0268
2.9727 0.1969 2000 3.0121
2.9638 0.2461 2500 3.0066
2.9605 0.2953 3000 3.0047
2.9584 0.3445 3500 3.0037
2.96 0.3937 4000 3.0041
2.9592 0.4429 4500 3.0040
2.9601 0.4921 5000 3.0040
2.9589 0.5414 5500 3.0039
2.96 0.5906 6000 3.0040
2.9584 0.6398 6500 3.0040
2.9609 0.6890 7000 3.0040
2.958 0.7382 7500 3.0039
2.9564 0.7874 8000 3.0040
2.9584 0.8366 8500 3.0039
2.9571 0.8859 9000 3.0039
2.9596 0.9351 9500 3.0039
2.9581 0.9843 10000 3.0040

Framework versions

  • Transformers 4.50.0
  • Pytorch 2.6.0+cu126
  • Datasets 3.4.1
  • Tokenizers 0.21.1
Downloads last month
15
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oriental-lab/Tr-Jp-LLM-1.5B-v2

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(9)
this model
Quantizations
1 model