trollek commited on
Commit
85352ef
·
verified ·
1 Parent(s): d7464ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - trollek/SimpleInstructionJudge-v01
5
+ language:
6
+ - en
7
+ base_model: h2oai/h2o-danube3-4b-base
8
+ ---
9
+ # LittleInstructionJudge-4B-v0.1
10
+
11
+ A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts.
12
+
13
+ The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments.
14
+
15
+ **Important note:** For reasons I don't know, I have issues running models like danube3 in LM Studio. Ollama runs them fine though.
16
+
17
+ ### Promt template
18
+
19
+ ```jinja2
20
+ Judge the instruction below using the following json format:
21
+ {
22
+ "intent": <the intent of the users instruction>,
23
+ "knowledge": <the knowledge required to respond to the instruction>,
24
+ "task_category": <the primary category that the instruction can be put in>,
25
+ "other_task_category": [<a list of other task categories that the instruction belongs to>],
26
+ "difficulty": <a rating of easy, medium or hard>,
27
+ "quality_explanation": <an explanation of the quality of the users instruction>,
28
+ "instruct_reward": <an integer between -10 and 10 reflecting the quality of the instruction>
29
+ }
30
+
31
+ This is the instruction I need you to judge:
32
+
33
+ {{instruction}}
34
+ ```
35
+
36
+ ### LLama-Factory training config
37
+
38
+ ```yaml
39
+ ### model
40
+ model_name_or_path: danube3/chatml-base
41
+
42
+ ### method
43
+ stage: sft
44
+ do_train: true
45
+ finetuning_type: full
46
+ use_badam: true
47
+ badam_switch_mode: ascending
48
+ badam_switch_interval: 50
49
+ badam_start_block: 6
50
+ badam_verbose: 1
51
+ seed: 8
52
+
53
+ ### dataset
54
+ dataset: balanced_instruction_judge
55
+ template: chatml
56
+ cutoff_len: 4096
57
+ overwrite_cache: false
58
+ preprocessing_num_workers: 12
59
+
60
+ ### output
61
+ output_dir: danube3/trained/LittleInstructionJudge-4B-v0.1
62
+ logging_steps: 5
63
+ save_steps: 1
64
+ save_strategy: epoch
65
+ plot_loss: true
66
+ overwrite_output_dir: false
67
+
68
+ ### train
69
+ per_device_train_batch_size: 1
70
+ gradient_accumulation_steps: 4
71
+ learning_rate: 0.0000015
72
+ num_train_epochs: 1
73
+ lr_scheduler_type: cosine
74
+ warmup_ratio: 0.01
75
+ pure_bf16: true
76
+ flash_attn: fa2
77
+
78
+ ### eval
79
+ val_size: 0.02
80
+ per_device_eval_batch_size: 1
81
+ eval_strategy: steps
82
+ eval_steps: 1000
83
+ ```
84
+
85
+
86
+ ### Training results
87
+
88
+ | Training Loss | Epoch | Step | Validation Loss |
89
+ |:-------------:|:------:|:-----:|:---------------:|
90
+ | 0.4062 | 0.0441 | 1000 | 0.3899 |
91
+ | 0.3346 | 0.0882 | 2000 | 0.3520 |
92
+ | 0.3192 | 0.1323 | 3000 | 0.3342 |
93
+ | 0.3007 | 0.1763 | 4000 | 0.3239 |
94
+ | 0.2792 | 0.2204 | 5000 | 0.3165 |
95
+ | 0.2957 | 0.2645 | 6000 | 0.3111 |
96
+ | 0.3254 | 0.3086 | 7000 | 0.3064 |
97
+ | 0.3058 | 0.3527 | 8000 | 0.3033 |
98
+ | 0.298 | 0.3968 | 9000 | 0.3011 |
99
+ | 0.3157 | 0.4409 | 10000 | 0.2995 |
100
+ | 0.3314 | 0.4849 | 11000 | 0.2979 |
101
+ | 0.301 | 0.5290 | 12000 | 0.2965 |
102
+ | 0.2927 | 0.5731 | 13000 | 0.2957 |
103
+ | 0.3199 | 0.6172 | 14000 | 0.2950 |
104
+ | 0.2924 | 0.6613 | 15000 | 0.2948 |
105
+ | 0.2784 | 0.7054 | 16000 | 0.2945 |
106
+ | 0.3069 | 0.7495 | 17000 | 0.2943 |
107
+ | 0.2813 | 0.7935 | 18000 | 0.2943 |
108
+ | 0.2934 | 0.8376 | 19000 | 0.2942 |
109
+ | 0.2762 | 0.8817 | 20000 | 0.2942 |
110
+ | 0.2792 | 0.9258 | 21000 | 0.2942 |
111
+ | 0.3057 | 0.9699 | 22000 | 0.2942 |