|
--- |
|
base_model: |
|
- maywell/kiqu-70b |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
license: cc-by-sa-4.0 |
|
language: |
|
- ko |
|
--- |
|
# Megakiqu-120b |
|
<img src="./megakiqu.jpg" alt="megakiqu-120B" width="390"/> |
|
MegaDolphin-120B๋ Venus-120B๊ณผ ์ ์ฌํ๊ฒ passthrough method๋ก ํ์ฅ๋ ๋ชจ๋ธ. |
|
|
|
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
|
## Merge Details |
|
### Merge Method |
|
|
|
This model was merged using the passthrough merge method. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [maywell/kiqu-70b](https://huggingface.co/maywell/kiqu-70b) |
|
|
|
## Original Model Card |
|
# **kiqu-70b** [(Arena Leaderboard)](https://huggingface.co/spaces/instructkr/ko-chatbot-arena-leaderboard) |
|
|
|
|
|
**kiqu-70b** is a SFT+DPO trained model based on Miqu-70B-Alpaca-DPO using **Korean** datasets. |
|
|
|
Since this model is finetune of miqu-1-70b using it on commercial purposes is at your own risk. โ leaked early version Mistral-Medium |
|
|
|
๋ณธ ๋ชจ๋ธ **kiqu-70b**๋ Miqu-70B-Alpaca-DPO ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก **ํ๊ตญ์ด** ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ์ฌ SFT+DPO ํ๋ จ์ ์งํํ์ฌ ์ ์๋์์ต๋๋ค. |
|
|
|
๋ฒ ์ด์ค ๋ชจ๋ธ์ธ miqu-1-70b ๋ชจ๋ธ์ด ๋ฏธ์คํธ๋-๋ฏธ๋์์ ์ด๊ธฐ ์ ์ถ ๋ฒ์ ์ด๊ธฐ์ ์์
์ ์ฌ์ฉ์ ๋ํ risk๋ ๋ณธ์ธ์๊ฒ ์์ต๋๋ค. |
|
|
|
Beside that this model follows **cc-by-sa-4.0** |
|
|
|
๋ณธ ๋ชจ๋ธ ์์ฒด๋ก์๋ **cc-by-sa-4.0**์ ๋ฐ๋ฆ
๋๋ค. |
|
|
|
# **Model Details** |
|
|
|
**Base Model** |
|
miqu-1-70b (Early Mistral-Medium) |
|
|
|
**Instruction format** |
|
|
|
It follows **Mistral** format. |
|
Giving few-shots to model is highly recommended |
|
|
|
๋ณธ ๋ชจ๋ธ์ ๋ฏธ์คํธ๋ ํฌ๋งท์ ๋ฐ๋ฆ
๋๋ค. |
|
few-shot ์ฌ์ฉ์ ์ ๊ทน ๊ถ์ฅํฉ๋๋ค. |
|
``` |
|
[INST] {instruction} |
|
[/INST] {output} |
|
``` |
|
|
|
Multi-shot |
|
``` |
|
[INST] {instruction} |
|
[/INST] {output} |
|
[INST] {instruction} |
|
[/INST] {output} |
|
[INST] {instruction} |
|
[/INST] {output} |
|
. |
|
. |
|
. |
|
``` |
|
|
|
**Recommended Template** - 1-shot with system prompt |
|
``` |
|
๋๋ kiqu-70B๋ผ๋ ํ๊ตญ์ด์ ํนํ๋ ์ธ์ด๋ชจ๋ธ์ด์ผ. ๊น๋ํ๊ณ ์์ฐ์ค๋ฝ๊ฒ ๋๋ตํด์ค! |
|
[INST] ์๋
? |
|
[/INST] ์๋
ํ์ธ์! ๋ฌด์์ ๋์๋๋ฆด๊น์? ์ง๋ฌธ์ด๋ ๊ถ๊ธํ ์ ์ด ์๋ค๋ฉด ์ธ์ ๋ ์ง ๋ง์ํด์ฃผ์ธ์. |
|
[INST] {instruction} |
|
[/INST] |
|
``` |
|
|
|
Trailing space after [/INST] can affect models performance in significant margin. So, when doing inference it is recommended to not include trailing space in chat template. |
|
|
|
[/INST] ๋ค์ ๋์ด์ฐ๊ธฐ๋ ๋ชจ๋ธ ์ฑ๋ฅ์ ์ ์๋ฏธํ ์ํฅ์ ๋ฏธ์นฉ๋๋ค. ๋ฐ๋ผ์, ์ธํผ๋ฐ์ค(์ถ๋ก )๊ณผ์ ์์๋ ์ฑ ํ
ํ๋ฆฟ์ ๋์ด์ฐ๊ธฐ๋ฅผ ์ ์ธํ๋ ๊ฒ์ ์ ๊ทน ๊ถ์ฅํฉ๋๋ค. |
|
|
|
### Configuration |
|
|
|
The following mergekit's YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
dtype: bfloat16 |
|
merge_method: passthrough |
|
slices: |
|
- sources: |
|
- layer_range: [0, 20] |
|
model: maywell/kiqu-70b |
|
- sources: |
|
- layer_range: [10, 30] |
|
model: maywell/kiqu-70b |
|
- sources: |
|
- layer_range: [20, 40] |
|
model: maywell/kiqu-70b |
|
- sources: |
|
- layer_range: [30, 50] |
|
model: maywell/kiqu-70b |
|
- sources: |
|
- layer_range: [40, 60] |
|
model: maywell/kiqu-70b |
|
- sources: |
|
- layer_range: [50, 70] |
|
model: maywell/kiqu-70b |
|
- sources: |
|
- layer_range: [60, 80] |
|
model: maywell/kiqu-70b |
|
``` |