|
--- |
|
library_name: transformers |
|
tags: |
|
- axolotl |
|
- generated_from_trainer |
|
datasets: |
|
- ChaoticNeutrals/Luminous_Opus |
|
- ChaoticNeutrals/Synthetic-Dark-RP |
|
- ChaoticNeutrals/Synthetic-RP |
|
model-index: |
|
- name: Tiny-Darkllama3.2-1B-Instruct |
|
results: [] |
|
base_model: |
|
- unsloth/Llama-3.2-1B |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl) |
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.6.0` |
|
```yaml |
|
base_model: unsloth/Llama-3.2-1B |
|
bf16: false |
|
dataset_prepared_path: last_run_prepared |
|
datasets: |
|
- chat_template: alpaca |
|
field_messages: conversations |
|
message_field_content: value |
|
message_field_role: from |
|
path: ChaoticNeutrals/Luminous_Opus |
|
split: train |
|
type: chat_template |
|
debug: null |
|
deepspeed: null |
|
early_stopping_patience: null |
|
evals_per_epoch: null |
|
flash_attention: false |
|
fp16: false |
|
fsdp: null |
|
fsdp_config: null |
|
gradient_accumulation_steps: 1 |
|
gradient_checkpointing: true |
|
group_by_length: false |
|
hub_model_id: mrcuddle/Tiny-Darkllama3.2-1B-Instruct |
|
is_llama_derived_model: true |
|
learning_rate: 0.0002 |
|
load_in_4bit: false |
|
load_in_8bit: false |
|
local_rank: null |
|
logging_steps: 1 |
|
lr_scheduler: linear |
|
max_steps: 20 |
|
micro_batch_size: 1 |
|
mlflow_experiment_name: colab-example |
|
model_type: LlamaForCausalLM |
|
num_epochs: 4 |
|
optimizer: adamw_torch |
|
output_dir: ./llama2 |
|
pad_to_sequence_len: true |
|
resume_from_checkpoint: null |
|
sample_packing: true |
|
saves_per_epoch: null |
|
sequence_len: 1096 |
|
special_tokens: null |
|
strict: false |
|
tf32: false |
|
tokenizer_type: LlamaTokenizer |
|
train_on_inputs: false |
|
wandb_entity: null |
|
wandb_log_model: null |
|
wandb_name: null |
|
wandb_project: null |
|
wandb_watch: null |
|
warmup_steps: 10 |
|
weight_decay: 0.0 |
|
xformers_attention: null |
|
|
|
``` |
|
|
|
</details><br> |
|
|
|
# Tiny-Darkllama3.2-1B-Instruct |
|
|
|
This model was trained from unsloth/Llama-3.2-1B on the ChaoticNeutrals/Luminous_Opus, Synthetic-Dark-RP, Synthetic-RP datasets. |
|
|
|
|
|
|
|
|
|
## Training and evaluation data |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0002 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 42 |
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 10 |
|
- training_steps: 20 |
|
|
|
### Training results |
|
[2025-02-11 13:09:27,300] [INFO] [axolotl.train.train:173] [PID:7240] [RANK:0] Starting trainer... |
|
[2025-02-11 13:09:27,706] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:203] [PID:7240] [RANK:0] gather_len_batches: [35] |
|
[2025-02-11 13:09:27,761] [INFO] [axolotl.callbacks.on_train_begin:39] [PID:7240] [RANK:0] The Axolotl config has been saved to the MLflow artifacts. |
|
{'loss': 3.4922, 'grad_norm': 9.877531051635742, 'learning_rate': 2e-05, 'epoch': 0.03} |
|
5% 1/20 [00:02<00:37, 1.98s/it][2025-02-11 13:09:31,221] [INFO] [axolotl.callbacks.on_step_end:127] [PID:7240] [RANK:0] cuda memory usage while training: 12.320GB (+8.604GB cache, +0.565GB misc) |
|
{'loss': 3.3057, 'grad_norm': 11.661816596984863, 'learning_rate': 4e-05, 'epoch': 0.06} |
|
{'loss': 2.4733, 'grad_norm': 8.751928329467773, 'learning_rate': 6e-05, 'epoch': 0.09} |
|
{'loss': 2.9842, 'grad_norm': 10.503549575805664, 'learning_rate': 8e-05, 'epoch': 0.11} |
|
{'loss': 2.6624, 'grad_norm': 12.645892143249512, 'learning_rate': 0.0001, 'epoch': 0.14} |
|
{'loss': 2.7616, 'grad_norm': 10.691230773925781, 'learning_rate': 0.00012, 'epoch': 0.17} |
|
{'loss': 2.9891, 'grad_norm': 10.076760292053223, 'learning_rate': 0.00014, 'epoch': 0.2} |
|
{'loss': 2.3745, 'grad_norm': 10.034379959106445, 'learning_rate': 0.00016, 'epoch': 0.23} |
|
{'loss': 2.4965, 'grad_norm': 9.778562545776367, 'learning_rate': 0.00018, 'epoch': 0.26} |
|
{'loss': 2.3811, 'grad_norm': 19.146963119506836, 'learning_rate': 0.0002, 'epoch': 0.29} |
|
{'loss': 3.3611, 'grad_norm': 14.556534767150879, 'learning_rate': 0.00018, 'epoch': 0.31} |
|
{'loss': 2.9619, 'grad_norm': 16.88424301147461, 'learning_rate': 0.00016, 'epoch': 0.34} |
|
{'loss': 2.121, 'grad_norm': 9.94941520690918, 'learning_rate': 0.00014, 'epoch': 0.37} |
|
{'loss': 2.1042, 'grad_norm': 23.178285598754883, 'learning_rate': 0.00012, 'epoch': 0.4} |
|
{'loss': 2.4722, 'grad_norm': 10.403461456298828, 'learning_rate': 0.0001, 'epoch': 0.43} |
|
{'loss': 2.7434, 'grad_norm': 11.339975357055664, 'learning_rate': 8e-05, 'epoch': 0.46} |
|
{'loss': 2.2349, 'grad_norm': 202.98793029785156, 'learning_rate': 6e-05, 'epoch': 0.49} |
|
{'loss': 2.3479, 'grad_norm': 10.250885009765625, 'learning_rate': 4e-05, 'epoch': 0.51} |
|
{'loss': 2.4169, 'grad_norm': 14.021651268005371, 'learning_rate': 2e-05, 'epoch': 0.54} |
|
{'loss': 3.4686, 'grad_norm': 10.988056182861328, 'learning_rate': 0.0, 'epoch': 0.57} |
|
{'train_runtime': 172.0118, 'train_samples_per_second': 0.116, 'train_steps_per_second': 0.116, 'train_loss': 2.707640600204468, 'epoch': 0.57} |
|
100% 20/20 [02:52<00:00, 8.65s/it] |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.48.3 |
|
- Pytorch 2.5.1+cu124 |
|
- Datasets 3.2.0 |
|
- Tokenizers 0.21.0 |