See axolotl config
axolotl version: 0.5.2
adapter: lora
base_model: echarlaix/tiny-random-mistral
bf16: auto
chat_template: llama3
datasets:
- data_files:
- 42fa4b965cededb3_train_data.json
ds_type: json
format: custom
path: /runs/taopanda-4_d0616b9d-99ec-4e9e-86fa-82059ce33170/42fa4b965cededb3_train_data.json
preprocessing:
- shuffle: true
type:
field: null
field_input: doc
field_instruction: original_text
field_output: edited_summary
field_system: null
format: null
no_input_format: null
system_format: '{system}'
system_prompt: ''
debug: null
deepspeed: null
device_map: auto
early_stopping_patience: 4
eval_batch_size: 4
eval_max_new_tokens: 128
eval_steps: 25
eval_strategy: steps
fp16: null
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: true
hub_model_id: taopanda-4/957a944b-e075-4b04-8c43-d11fbfdd15aa
hub_strategy: every_save
learning_rate: 0.00010312140429884754
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 64
lora_dropout: 0.05
lora_fan_in_fan_out: true
lora_model_dir: null
lora_r: 32
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 750
micro_batch_size: 16
model_type: AutoModelForCausalLM
num_epochs: 32
optimizer: paged_adamw_32bit
output_dir: ./outputs/lora-out/taopanda-4_d0616b9d-99ec-4e9e-86fa-82059ce33170
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
save_steps: 25
save_total_limit: 5
seed: 40883
sequence_len: 512
special_tokens:
pad_token: </s>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.05
wandb_entity: fatcat87-taopanda
wandb_mode: online
wandb_name: taopanda-4_d0616b9d-99ec-4e9e-86fa-82059ce33170
wandb_project: subnet56
wandb_runid: taopanda-4_d0616b9d-99ec-4e9e-86fa-82059ce33170
warmup_ratio: 0.1
weight_decay: 0.1
xformers_attention: null
957a944b-e075-4b04-8c43-d11fbfdd15aa
This model is a fine-tuned version of echarlaix/tiny-random-mistral on the None dataset. It achieves the following results on the evaluation set:
- Loss: 10.3104
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00010312140429884754
- train_batch_size: 16
- eval_batch_size: 4
- seed: 40883
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- total_eval_batch_size: 16
- optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 48
- training_steps: 480
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
10.3814 | 0.0667 | 1 | 10.3775 |
10.3723 | 1.6667 | 25 | 10.3749 |
10.3609 | 3.3333 | 50 | 10.3615 |
10.3307 | 5.0 | 75 | 10.3299 |
10.3229 | 6.6667 | 100 | 10.3222 |
10.3171 | 8.3333 | 125 | 10.3189 |
10.3157 | 10.0 | 150 | 10.3158 |
10.3147 | 11.6667 | 175 | 10.3147 |
10.3109 | 13.3333 | 200 | 10.3143 |
10.3121 | 15.0 | 225 | 10.3137 |
10.3133 | 16.6667 | 250 | 10.3133 |
10.3113 | 18.3333 | 275 | 10.3129 |
10.3142 | 20.0 | 300 | 10.3124 |
10.3117 | 21.6667 | 325 | 10.3119 |
10.3107 | 23.3333 | 350 | 10.3118 |
10.309 | 25.0 | 375 | 10.3113 |
10.3134 | 26.6667 | 400 | 10.3110 |
10.3083 | 28.3333 | 425 | 10.3108 |
10.3103 | 30.0 | 450 | 10.3105 |
10.312 | 31.6667 | 475 | 10.3104 |
Framework versions
- PEFT 0.13.2
- Transformers 4.46.3
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for taopanda-4/957a944b-e075-4b04-8c43-d11fbfdd15aa
Base model
echarlaix/tiny-random-mistral