metadata
library_name: transformers
tags: []
Qwen2.5-7B-Instruct-preference
Model Description
Qwen2.5-7B-Instruct-preference is a fine-tuned model based on Qwen/Qwen2.5-7B-Instruct. This model is fine-tuned on original dataset. The fine-tuned were carried out at a 1024 context length.
Benchmarking
The benchmark score is obtained using arena-hard-auto-multilingual.
Qwen2.5-7B-Instruct | Ours |
---|---|
50.0 | 56.6 |
Model Details
- Model size: 7B
- Context length: 1024
- Language: Japanese
Training Procudure
- learning_rate: 5e-6
- train_batch_size: 4
- eval_batch_size: 2
- gradient_accumulation_steps: 4
- lr_scheduler_type: cosine
Training Results
Step | Traning Loss | Validation Loss |
---|---|---|
10 | 0.678400 | 0.665870 |
20 | 0.608500 | 0.638361 |
30 | 0.577300 | 0.607468 |
40 | 0.526700 | 0.559432 |
50 | 0.489200 | 0.523419 |
60 | 0.502800 | 0.511645 |
70 | 0.462300 | 0.506989 |
80 | 0.419600 | 0.509142 |
90 | 0.445200 | 0.510396 |
100 | 0.424400 | 0.511653 |