llama-3.2-350M-fourier

This model is a fine-tuned version of llama_small_config.json on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
4.6741 0.0552 1000 4.6001
3.6587 0.1105 2000 3.7151
3.2297 0.1657 3000 3.2263
3.0288 0.2210 4000 3.0349
2.9584 0.2762 5000 2.9393
2.9657 0.3315 6000 2.9893
2.8654 0.3867 7000 2.8074
2.6982 0.4420 8000 2.7580
2.7292 0.4972 9000 2.7214
2.7568 0.5525 10000 2.6956
2.6141 0.6077 11000 2.6669
2.631 0.6630 12000 2.6421
2.6837 0.7182 13000 2.6185
2.6257 0.7734 14000 2.6032
2.5669 0.8287 15000 2.5918
2.6383 0.8839 16000 2.5836
2.5749 0.9392 17000 2.5796
2.613 0.9944 18000 2.5784

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.3.1+cu118
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
55
Safetensors
Model size
346M params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.