trollek
/

LittleInstructionJudge-4B-v0.1

+---
+license: apache-2.0
+datasets:
+- trollek/SimpleInstructionJudge-v01
+language:
+- en
+base_model: h2oai/h2o-danube3-4b-base
+---
+# LittleInstructionJudge-4B-v0.1
+A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts.
+The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments.
+**Important note:** For reasons I don't know, I have issues running models like danube3 in LM Studio. Ollama runs them fine though.
+### Promt template
+```jinja2
+Judge the instruction below using the following json format:
+{
+  "intent": <the intent of the users instruction>,
+  "knowledge": <the knowledge required to respond to the instruction>,
+  "task_category": <the primary category that the instruction can be put in>,
+  "other_task_category": [<a list of other task categories that the instruction belongs to>],
+  "difficulty": <a rating of easy, medium or hard>,
+  "quality_explanation": <an explanation of the quality of the users instruction>,
+  "instruct_reward": <an integer between -10 and 10 reflecting the quality of the instruction>
+}
+This is the instruction I need you to judge:
+{{instruction}}
+```
+### LLama-Factory training config
+```yaml
+### model
+model_name_or_path: danube3/chatml-base
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_badam: true
+badam_switch_mode: ascending
+badam_switch_interval: 50
+badam_start_block: 6
+badam_verbose: 1
+seed: 8
+### dataset
+dataset: balanced_instruction_judge
+template: chatml
+cutoff_len: 4096
+overwrite_cache: false
+preprocessing_num_workers: 12
+### output
+output_dir: danube3/trained/LittleInstructionJudge-4B-v0.1
+logging_steps: 5
+save_steps: 1
+save_strategy: epoch
+plot_loss: true
+overwrite_output_dir: false
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 4
+learning_rate: 0.0000015
+num_train_epochs: 1
+lr_scheduler_type: cosine
+warmup_ratio: 0.01
+pure_bf16: true
+flash_attn: fa2
+### eval
+val_size: 0.02
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 1000
+```
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 0.4062        | 0.0441 | 1000  | 0.3899          |
+| 0.3346        | 0.0882 | 2000  | 0.3520          |
+| 0.3192        | 0.1323 | 3000  | 0.3342          |
+| 0.3007        | 0.1763 | 4000  | 0.3239          |
+| 0.2792        | 0.2204 | 5000  | 0.3165          |
+| 0.2957        | 0.2645 | 6000  | 0.3111          |
+| 0.3254        | 0.3086 | 7000  | 0.3064          |
+| 0.3058        | 0.3527 | 8000  | 0.3033          |
+| 0.298         | 0.3968 | 9000  | 0.3011          |
+| 0.3157        | 0.4409 | 10000 | 0.2995          |
+| 0.3314        | 0.4849 | 11000 | 0.2979          |
+| 0.301         | 0.5290 | 12000 | 0.2965          |
+| 0.2927        | 0.5731 | 13000 | 0.2957          |
+| 0.3199        | 0.6172 | 14000 | 0.2950          |
+| 0.2924        | 0.6613 | 15000 | 0.2948          |
+| 0.2784        | 0.7054 | 16000 | 0.2945          |
+| 0.3069        | 0.7495 | 17000 | 0.2943          |
+| 0.2813        | 0.7935 | 18000 | 0.2943          |
+| 0.2934        | 0.8376 | 19000 | 0.2942          |
+| 0.2762        | 0.8817 | 20000 | 0.2942          |
+| 0.2792        | 0.9258 | 21000 | 0.2942          |
+| 0.3057        | 0.9699 | 22000 | 0.2942          |