--- base_model: elyza/ELYZA-japanese-Llama-2-7b-instruct tags: - generated_from_trainer model-index: - name: medusa-ELYZA-japanese-Llama-2-7b-instruct results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) # medusa-ELYZA-japanese-Llama-2-7b-instruct This model is a fine-tuned version of [elyza/ELYZA-japanese-Llama-2-7b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b-instruct) on the None dataset. It achieves the following results on the evaluation set: - Loss: 2.3564 ## Model description This is a Medusa-2 created using [Medusa](https://github.com/FasterDecoding/Medusa). ## Intended uses & limitations - [【Orion-14B Series】 Models Community License Agreement](https://huggingface.co/OrionStarAI/Orion-14B-Chat/blob/main/ModelsCommunityLicenseAgreement) ## Training and evaluation data - [shi3z/ja_conv_wikipedia_orion14B_100K](https://huggingface.co/datasets/shi3z/ja_conv_wikipedia_orion14B_100K) ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 40 - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 2.684 | 0.06 | 40 | 2.7430 | | 2.5302 | 0.11 | 80 | 2.6693 | | 2.486 | 0.17 | 120 | 2.6273 | | 2.557 | 0.23 | 160 | 2.6020 | | 2.4913 | 0.28 | 200 | 2.5868 | | 2.5317 | 0.34 | 240 | 2.5646 | | 2.4795 | 0.4 | 280 | 2.5521 | | 2.4221 | 0.45 | 320 | 2.5359 | | 2.4464 | 0.51 | 360 | 2.5231 | | 2.4534 | 0.57 | 400 | 2.5095 | | 2.4685 | 0.62 | 440 | 2.4967 | | 2.4575 | 0.68 | 480 | 2.4849 | | 2.4299 | 0.74 | 520 | 2.4771 | | 2.459 | 0.79 | 560 | 2.4604 | | 2.4585 | 0.85 | 600 | 2.4527 | | 2.4832 | 0.91 | 640 | 2.4425 | | 2.4255 | 0.96 | 680 | 2.4285 | | 2.2209 | 1.02 | 720 | 2.4312 | | 2.3142 | 1.07 | 760 | 2.4288 | | 2.1961 | 1.13 | 800 | 2.4252 | | 2.1394 | 1.19 | 840 | 2.4194 | | 2.2005 | 1.24 | 880 | 2.4093 | | 2.0748 | 1.3 | 920 | 2.4003 | | 2.109 | 1.36 | 960 | 2.3935 | | 2.2209 | 1.41 | 1000 | 2.3856 | | 2.1938 | 1.47 | 1040 | 2.3786 | | 2.1056 | 1.53 | 1080 | 2.3716 | | 2.0948 | 1.58 | 1120 | 2.3674 | | 2.218 | 1.64 | 1160 | 2.3629 | | 2.17 | 1.7 | 1200 | 2.3601 | | 2.1084 | 1.75 | 1240 | 2.3590 | | 2.0446 | 1.81 | 1280 | 2.3567 | | 2.1517 | 1.87 | 1320 | 2.3572 | | 2.2342 | 1.92 | 1360 | 2.3565 | | 2.1552 | 1.98 | 1400 | 2.3564 | ### Framework versions - Transformers 4.34.1 - Pytorch 2.1.2 - Datasets 2.16.1 - Tokenizers 0.14.1