minpeter
/

tiny-ko-sft

@@ -1,6 +1,6 @@
 ---
 library_name: transformers
-base_model: minpeter/pretrained-tiny-ko
 tags:
 - axolotl
 - generated_from_trainer
@@ -13,6 +13,8 @@ datasets:
 - FreedomIntelligence/sharegpt-korean
 - coastral/korean-writing-style-instruct
 - devngho/korean-instruction-mix
 model-index:
 - name: tiny-ko-sft
   results: []
@@ -26,7 +28,7 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.10.0.dev0`
 ```yaml
-base_model: minpeter/pretrained-tiny-ko
 hub_model_id: minpeter/tiny-ko-sft
 output_dir: ./outputs/tiny-ko-sft
@@ -104,6 +106,22 @@ datasets:
       role: from
       content: value
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.05
@@ -111,17 +129,17 @@ save_steps: 200
 warmup_steps: 20
 eval_steps: 200
-sequence_len: 2048
 # <<<< experimental settings <<<<
-sample_packing: false
-train_on_inputs: true
 # >>>> experimental settings >>>
 pad_to_sequence_len: true
 gradient_accumulation_steps: 4
-micro_batch_size: 16
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
@@ -130,15 +148,6 @@ learning_rate: 1e-3
 bf16: auto
 tf32: false
-added_tokens_overrides:
-  128001: "<|im_end|>"
-  128002: "<|im_start|>"
-special_tokens:
-  bos_token: <|begin_of_text|>
-  eos_token: <|im_end|>
-  pad_token: <|im_end|>
 gradient_checkpointing: true
 gradient_checkpointing_kwargs:
   use_reentrant: false
@@ -146,7 +155,7 @@ resume_from_checkpoint:
 logging_steps: 1
 flash_attention: true
-num_epochs: 3
 weight_decay: 0.0
 ```
@@ -155,9 +164,9 @@ weight_decay: 0.0
 # tiny-ko-sft
-This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
 It achieves the following results on the evaluation set:
-- Loss: 1.4059
 ## Model description
@@ -177,38 +186,28 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.001
-- train_batch_size: 16
-- eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 128
-- total_eval_batch_size: 32
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 20
-- training_steps: 2972
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 3.9539        | 0.0010 | 1    | 3.9757          |
-| 1.6999        | 0.2019 | 200  | 1.6884          |
-| 1.6123        | 0.4037 | 400  | 1.6288          |
-| 1.5387        | 0.6056 | 600  | 1.5876          |
-| 1.5681        | 0.8075 | 800  | 1.5429          |
-| 1.3066        | 1.0091 | 1000 | 1.5208          |
-| 1.395         | 1.2110 | 1200 | 1.5007          |
-| 1.3474        | 1.4128 | 1400 | 1.4699          |
-| 1.3025        | 1.6147 | 1600 | 1.4383          |
-| 1.2566        | 1.8166 | 1800 | 1.4117          |
-| 1.1672        | 2.0182 | 2000 | 1.4227          |
-| 1.1267        | 2.2200 | 2200 | 1.4141          |
-| 1.0195        | 2.4219 | 2400 | 1.4098          |
-| 1.084         | 2.6238 | 2600 | 1.4063          |
-| 1.1254        | 2.8256 | 2800 | 1.4059          |
 ### Framework versions

 ---
 library_name: transformers
+base_model: minpeter/tiny-ko-base
 tags:
 - axolotl
 - generated_from_trainer
 - FreedomIntelligence/sharegpt-korean
 - coastral/korean-writing-style-instruct
 - devngho/korean-instruction-mix
+- youjunhyeok/Magpie-Pro-300K-Filtered-ko
+- youjunhyeok/smoltalk-ko-translate
 model-index:
 - name: tiny-ko-sft
   results: []
 axolotl version: `0.10.0.dev0`
 ```yaml
+base_model: minpeter/tiny-ko-base
 hub_model_id: minpeter/tiny-ko-sft
 output_dir: ./outputs/tiny-ko-sft
       role: from
       content: value
+  - path: youjunhyeok/Magpie-Pro-300K-Filtered-ko
+    type: chat_template
+    split: train[:10%]
+    field_messages: conversations
+    message_property_mappings:
+      role: from
+      content: value
+  - path: youjunhyeok/smoltalk-ko-translate
+    type: chat_template
+    name: merge_filtered
+    field_messages: conversations
+    message_property_mappings:
+      role: role
+      content: content
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.05
 warmup_steps: 20
 eval_steps: 200
+sequence_len: 4096
 # <<<< experimental settings <<<<
+sample_packing: true
+train_on_inputs: false
 # >>>> experimental settings >>>
 pad_to_sequence_len: true
 gradient_accumulation_steps: 4
+micro_batch_size: 32
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
 bf16: auto
 tf32: false
 gradient_checkpointing: true
 gradient_checkpointing_kwargs:
   use_reentrant: false
 logging_steps: 1
 flash_attention: true
+num_epochs: 1
 weight_decay: 0.0
 ```
 # tiny-ko-sft
+This model is a fine-tuned version of [minpeter/tiny-ko-base](https://huggingface.co/minpeter/tiny-ko-base) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct, the devngho/korean-instruction-mix, the youjunhyeok/Magpie-Pro-300K-Filtered-ko and the youjunhyeok/smoltalk-ko-translate datasets.
 It achieves the following results on the evaluation set:
+- Loss: 1.5297
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.001
+- train_batch_size: 32
+- eval_batch_size: 32
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 256
+- total_eval_batch_size: 64
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 20
+- training_steps: 817
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 2.3518        | 0.0012 | 1    | 2.3640          |
+| 1.6322        | 0.2446 | 200  | 1.6913          |
+| 1.5903        | 0.4891 | 400  | 1.6003          |
+| 1.5146        | 0.7337 | 600  | 1.5392          |
+| 1.5277        | 0.9783 | 800  | 1.5297          |
 ### Framework versions