metadata
datasets:
- NewEden/Orion-Asstr-Stories-16K
- Mielikki/Erebus-87k
base_model:
- Unsloth/phi-4
tags:
- phi
- roleplay
- finetune
- storywriting

Hamanasu 15B R1 PT
🌌 Overview
This is the 1st pretrain of Phi-4 with the following:
NewEden/Orion-LIT
This model has not been instruct tuned, Ablities to converse may be reduced from the original model, If you would like to roleplay, Please use the Instruct version.
Axolotl Config ꒰(˶• ᴗ •˶)꒱
base_model: unsloth_phi-4
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
#hub_model_id: NewEden/Phi4-pretrain
#hub_strategy: "all_checkpoints"
#push_dataset_to_hub:
#hf_use_auth_token: true
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
#plugins:
# - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
#cut_cross_entropy: true
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: Mielikki/Erebus-87k
type: completion
field: body
- path: NewEden/Orion-Asstr-Stories-16K
type: completion
field: content
shuffle_merged_datasets: true
dataset_prepared_path: prepared_data
val_set_size: 0.0
output_dir: ./phi4-pt-out-r2
sequence_len: 16384
sample_packing: true
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
lora_modules_to_save:
- embed_tokens
- lm_head
wandb_project: mag-phi
wandb_entity:
wandb_watch:
wandb_name: attempt-02
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_ademamix_8bit
lr_scheduler: cosine
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: unsloth
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 4
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
weight_decay: 0.01
fsdp:
fsdp_config: