|
--- |
|
license: cc-by-sa-4.0 |
|
datasets: |
|
- izumi-lab/llm-japanese-dataset |
|
language: |
|
- ja |
|
tags: |
|
- llama |
|
- causal-lm |
|
--- |
|
|
|
This repo contains a low-rank adapter for LLaMA-7b |
|
fit on the [llm-japanese-dataset](https://github.com/masanorihirano/llm-japanese-dataset) dataset. |
|
|
|
This version of the weights was trained with the following hyperparameters: |
|
|
|
- Epochs: 5 |
|
- Batch size: 128 |
|
- Cutoff length: 256 |
|
- Learning rate: 3e-4 |
|
- Lora _r_: 4 |
|
- Lora target modules: q_proj, v_proj |
|
|
|
```python |
|
import torch |
|
from transformers import LlamaForCausalLM, LlamaTokenizer |
|
from peft import PeftModel |
|
|
|
base_model = "decapoda-research/llama-7b-hf" |
|
# Please note that the special license of decapoda-research/llama-7b-hf is applied. |
|
model = LlamaForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16) |
|
tokenizer = LlamaTokenizer.from_pretrained(base_model) |
|
model = PeftModel.from_pretrained( |
|
model, |
|
"izumi-lab/llama-7b-japanese-lora-v0", |
|
torch_dtype=torch.float16, |
|
) |
|
``` |
|
|
|
To see more latest information, please go to [llm.msuzuki.me](https://llm.msuzuki.me). |
|
|
|
## Details |
|
|
|
- Japanese Paper: [https://jxiv.jst.go.jp/index.php/jxiv/preprint/view/422](https://jxiv.jst.go.jp/index.php/jxiv/preprint/view/422) |
|
- English Paper: |
|
- GitHub: [https://github.com/retarfi/jallm] |
|
- Website: [llm.msuzuki.me](https://llm.msuzuki.me). |
|
|
|
Citation: |
|
|
|
|
|
If you have any inquiries, such as joint research, data provision, various types of support, please email to [email protected] . |