README.md · lightblue/Karasu-DPO-7B at acff0c50b2b0d2ca0ae2843c7f4ef91d57d53e3c

metadata

library_name: transformers
tags: []

Qwen2.5-7B-Instruct-preference

Model Description

Qwen2.5-7B-Instruct-preference is a fine-tuned model based on Qwen/Qwen2.5-7B-Instruct. This model is fine-tuned on original dataset. The fine-tuned were carried out at a 1024 context length.

Benchmarking

The benchmark score is obtained using arena-hard-auto-multilingual.

Qwen2.5-7B-Instruct	Ours
50.0	56.6

Model Details

Model size: 7B
Context length: 1024
Language: Japanese

Training Procudure

learning_rate: 5e-6
train_batch_size: 4
eval_batch_size: 2
gradient_accumulation_steps: 4
lr_scheduler_type: cosine

Training Results

Step	Traning Loss	Validation Loss
10	0.678400	0.665870
20	0.608500	0.638361
30	0.577300	0.607468
40	0.526700	0.559432
50	0.489200	0.523419
60	0.502800	0.511645
70	0.462300	0.506989
80	0.419600	0.509142
90	0.445200	0.510396
100	0.424400	0.511653