|
Quantization made by Richard Erkhov. |
|
|
|
[Github](https://github.com/RichardErkhov) |
|
|
|
[Discord](https://discord.gg/pvy7H8DZMG) |
|
|
|
[Request more models](https://github.com/RichardErkhov/quant_request) |
|
|
|
|
|
Llama-3-Swallow-8B-v0.1 - GGUF |
|
- Model creator: https://huggingface.co/tokyotech-llm/ |
|
- Original model: https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1/ |
|
|
|
|
|
| Name | Quant method | Size | |
|
| ---- | ---- | ---- | |
|
| [Llama-3-Swallow-8B-v0.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q2_K.gguf) | Q2_K | 2.96GB | |
|
| [Llama-3-Swallow-8B-v0.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.IQ3_XS.gguf) | IQ3_XS | 3.28GB | |
|
| [Llama-3-Swallow-8B-v0.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.IQ3_S.gguf) | IQ3_S | 3.43GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q3_K_S.gguf) | Q3_K_S | 3.41GB | |
|
| [Llama-3-Swallow-8B-v0.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.IQ3_M.gguf) | IQ3_M | 3.52GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q3_K.gguf) | Q3_K | 3.74GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q3_K_M.gguf) | Q3_K_M | 3.74GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q3_K_L.gguf) | Q3_K_L | 4.03GB | |
|
| [Llama-3-Swallow-8B-v0.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.IQ4_XS.gguf) | IQ4_XS | 4.18GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q4_0.gguf) | Q4_0 | 4.34GB | |
|
| [Llama-3-Swallow-8B-v0.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.IQ4_NL.gguf) | IQ4_NL | 4.38GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q4_K_S.gguf) | Q4_K_S | 4.37GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q4_K.gguf) | Q4_K | 4.58GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q4_K_M.gguf) | Q4_K_M | 4.58GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q4_1.gguf) | Q4_1 | 4.78GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q5_0.gguf) | Q5_0 | 5.21GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q5_K_S.gguf) | Q5_K_S | 5.21GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q5_K.gguf) | Q5_K | 5.34GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q5_K_M.gguf) | Q5_K_M | 5.34GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q5_1.gguf) | Q5_1 | 5.65GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q6_K.gguf) | Q6_K | 6.14GB | |
|
| [Llama-3-Swallow-8B-v0.1.Q8_0.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-8B-v0.1-gguf/blob/main/Llama-3-Swallow-8B-v0.1.Q8_0.gguf) | Q8_0 | 7.95GB | |
|
|
|
|
|
|
|
|
|
Original model description: |
|
--- |
|
language: |
|
- en |
|
- ja |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
license: llama3 |
|
model_type: llama |
|
--- |
|
|
|
# Llama3 Swallow |
|
|
|
Our Swallow model has undergone continual pre-training from the [Llama 3 family](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6), primarily with the addition of Japanese language data. The Instruct versions use supervised fine-tuning (SFT) and Chat Vector. Links to other models can be found in the index. |
|
|
|
|
|
# Model Release Updates |
|
|
|
We are excited to share the release schedule for our latest models: |
|
- **July 1, 2024**: Released the [Llama-3-Swallow-8B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1), [Llama-3-Swallow-8B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1), [Llama-3-Swallow-70B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1), and [Llama-3-Swallow-70B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1). |
|
|
|
## Swallow Model Index |
|
|
|
|Model|Llama-3-Swallow|Llama3 Swallow Instruct| |
|
|---|---|---| |
|
|8B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1) | |
|
|70B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1) | |
|
|
|
 |
|
|
|
This repository provides large language models developed by [Swallow-LLM](https://swallow-llm.github.io/). |
|
Read our [blog post](https://zenn.dev/tokyotech_lm/articles/f65989d76baf2c). |
|
|
|
## Model Details |
|
|
|
* **Model type**: Please refer to [Llama 3 MODEL_CARD](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the model architecture. |
|
* **Language(s)**: Japanese English |
|
* **Library**: [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) |
|
* **Tokenizer**: Please refer to [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/) for details on the tokenizer. |
|
* **Contact**: swallow[at]nlp.c.titech.ac.jp |
|
|
|
## Model Performance |
|
|
|
### Japanese tasks |
|
|
|
|Model|Size|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg| |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---| |
|
| | |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| | |
|
| | |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| | |
|
|Llama-2-7b|7B|0.2618|0.4914|0.3301|0.8001|0.1742|0.0560|0.1764|0.1742|0.2824|0.1250|0.2872| |
|
|Swallow-7b-hf|7B|0.4888|0.5044|**0.5925**|0.8424|0.1823|0.1240|0.2505|0.1482|0.3219|0.0183|0.3473| |
|
|Mistral-7B-v0.1|7B|0.7471|0.4482|0.2691|0.8588|0.2026|0.1880|0.1430|0.1738|0.4213|0.2598|0.3712| |
|
|Swallow-MS-7b-v0.1|7B|0.8758|**0.5153**|0.5647|0.8762|0.1993|0.2400|0.2507|0.1667|0.4527|0.2335|0.4375| |
|
|Qwen2-7B|7B|0.8776|0.4627|0.3766|**0.8984**|0.1716|**0.5480**|0.2080|0.1949|**0.5871**|**0.4183**|**0.4805**| |
|
|Meta-Llama-3-8B|8B|0.8356|0.4454|0.4002|0.8881|0.1757|0.3320|0.2199|0.2087|0.4558|0.3311|0.4292| |
|
|llama-3-youko-8b|8B|0.8660|0.4902|0.5155|0.8947|**0.2127**|0.2840|0.2740|0.2180|0.4493|0.2183|0.4423| |
|
|Llama-3-Swallow-8B-v0.1|8B|**0.8945**|0.4848|0.5640|0.8947|0.1981|0.4240|**0.2758**|**0.2223**|0.4699|0.2890|0.4717| |
|
|
|
### English tasks |
|
|
|
|Model|Size|OpenBookQA|TriviaQA|HellaSWAG|SQuAD2.0|XWINO|MMLU|GSM8K|BBH|HumanEval|En Avg| |
|
|---|---|---|---|---|---|---|---|---|---|---|---| |
|
| | |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot| | |
|
| | |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1| | |
|
|Llama-2-7b|7B|0.3720|0.6385|0.5826|0.2911|0.9045|0.4590|0.1266|0.3993|0.1354|0.4343| |
|
|Swallow-7b-hf|7B|0.3080|0.4921|0.5269|0.2608|0.8847|0.3918|0.0963|0.3531|0.0402|0.3727| |
|
|Mistral-7B-v0.1|7B|0.3740|0.7030|**0.6260**|0.3381|**0.9067**|0.6236|0.3851|0.5597|0.2841|0.5334| |
|
|Swallow-MS-7b-v0.1|7B|0.3480|0.5995|0.5798|0.3011|0.9015|0.5486|0.2669|0.4916|0.2732|0.4789| |
|
|Qwen2-7B|7B|0.3740|0.6105|0.6006|**0.3623**|0.8916|**0.7045**|**0.7748**|0.5325|**0.4622**|**0.5903**| |
|
|Meta-Llama-3-8B|8B|**0.3760**|**0.7109**|0.6124|0.3356|0.9032|0.6509|0.4936|**0.6211**|0.3793|0.5648| |
|
|llama-3-youko-8b|8B|0.3500|0.6252|0.5885|0.3247|0.8959|0.5993|0.3571|0.5704|0.2793|0.5100| |
|
|Llama-3-Swallow-8B-v0.1|8B|0.3520|0.6563|0.5901|0.3507|0.9006|0.6152|0.4875|0.5936|0.3323|0.5420| |
|
|
|
## Evaluation Benchmarks |
|
|
|
### Japanese evaluation benchmarks |
|
|
|
We used llm-jp-eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows: |
|
|
|
- Multiple-choice question answering (JCommonsenseQA [Kurihara et al., 2022]) |
|
- Open-ended question answering (JEMHopQA [Ishii et al., 2024]) |
|
- Open-ended question answering (NIILC [関根, 2003]) |
|
- Machine reading comprehension (JSQuAD [Kurihara et al., 2022]) |
|
- Automatic summarization (XL-Sum [Hasan et al., 2021]) |
|
- Machine translation (WMT2020 ja-en [Barrault et al., 2020]) |
|
- Machine translation (WMT2020 en-ja [Barrault et al., 2020]) |
|
- Mathematical reasoning (MGSM [Shi et al., 2023]) |
|
- Academic exams (JMMLU [尹ら, 2024]) |
|
- Code generation (JHumanEval [佐藤ら, 2024]) |
|
|
|
### English evaluation benchmarks |
|
|
|
We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows: |
|
|
|
- Multiple-choice question answering (OpenBookQA [Mihaylov et al., 2018]) |
|
- Open-ended question answering (TriviaQA [Joshi et al., 2017]) |
|
- Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018]) |
|
- Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021]) |
|
- Natural language inference (HellaSwag [Zellers et al., 2019]) |
|
- Mathematical reasoning (GSM8K [Cobbe et al., 2021]) |
|
- Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023]) |
|
- Academic exams (MMLU [Hendrycks et al., 2021]) |
|
- Code generation (HumanEval [Chen et al., 2021]) |
|
|
|
## Training Datasets |
|
|
|
### Continual Pre-Training |
|
The following datasets were used for continual pre-training. |
|
|
|
- [Algebraic Stack](https://huggingface.co/datasets/EleutherAI/proof-pile-2) |
|
- [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) |
|
- [English Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) |
|
- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) |
|
- [Laboro ParaCorpus](https://github.com/laboroai/Laboro-ParaCorpus) |
|
- [OpenWebMath](https://huggingface.co/datasets/EleutherAI/proof-pile-2) |
|
- [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) |
|
- [Swallow Corpus](https://arxiv.org/abs/2404.17733) |
|
|
|
## Risks and Limitations |
|
|
|
The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations. |
|
|
|
## Acknowledgements |
|
|
|
We thank Meta Research for releasing Llama 3 under an open license for others to build on. |
|
|
|
Our project is supported by the [Large Generative AI Development Support Program](https://abci.ai/en/link/lfm_support_program.html) of the National Institute of Advanced Industrial Science and Technology. |
|
|
|
## License |
|
|
|
[META LLAMA 3 COMMUNITY LICENSE](https://llama.meta.com/llama3/license/) |
|
|
|
## Authors |
|
|
|
Here are the team members: |
|
- From [Tokyo Institute of Technology Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members: |
|
- [Naoaki Okazaki](https://www.chokkan.org/index.ja.html) |
|
- [Sakae Mizuki](https://s-mizuki-nlp.github.io/) |
|
- [Youmi Ma](https://www.nlp.c.titech.ac.jp/member/youmi.en.html) |
|
- [Koki Maeda](https://sites.google.com/view/silviase) |
|
- [Kakeru Hattori](https://aya-se.vercel.app/) |
|
- [Masanari Ohi](https://sites.google.com/view/masanariohi) |
|
- [Taihei Shiotani](https://github.com/inatoihs) |
|
- [Koshiro Saito](https://sites.google.com/view/koshiro-saito) |
|
- From [Tokyo Institute of Technology YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members: |
|
- [Rio Yokota](https://twitter.com/rioyokota) |
|
- [Kazuki Fujii](https://twitter.com/okoge_kaz) |
|
- [Taishi Nakamura](https://twitter.com/Setuna7777_2) |
|
- [Takumi Okamoto](https://www.linkedin.com/in/takumi-okamoto) |
|
- [Ishida Shigeki](https://www.wantedly.com/id/reborn27) |
|
- From [Artificial Intelligence Research Center, AIST, Japan](https://www.airc.aist.go.jp/en/teams/), the following members: |
|
- [Hiroya Takamura](https://sites.google.com/view/hjtakamura) |
|
|
|
## How to cite |
|
|
|
If you find our work helpful, please feel free to cite us. |
|
|
|
``` |
|
@inproceedings{Fujii:COLM2024, |
|
title={Continual Pre-Training for Cross-Lingual LLM Adaptation: |
|
Enhancing Japanese Language Capabilities}, |
|
author={Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki |
|
Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae |
|
Mizuki and Rio Yokota and Naoaki Okazaki}, |
|
booktitle="Proceedings of the First Conference on Language Modeling", |
|
series={COLM}, |
|
pages="(to appear)", |
|
year="2024", |
|
month=oct, |
|
address={University of Pennsylvania, USA}, |
|
} |
|
|
|
@inproceedings{Okazaki:COLM2024, |
|
title={Building a Large Japanese Web Corpus for Large Language Models}, |
|
author={Naoaki Okazaki and Kakeru Hattori and Hirai Shota and Hiroki |
|
Iida and Masanari Ohi and Kazuki Fujii and Taishi Nakamura and Mengsay |
|
Loem and Rio Yokota and Sakae Mizuki}, |
|
booktitle="Proceedings of the First Conference on Language Modeling", |
|
series={COLM}, |
|
pages="(to appear)", |
|
year="2024", |
|
month=oct, |
|
address={University of Pennsylvania, USA}, |
|
} |
|
``` |
|
|
|
### Citations |
|
|
|
```tex |
|
@article{llama3modelcard, |
|
title={Llama 3 Model Card}, |
|
author={AI@Meta}, |
|
year={2024}, |
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} |
|
} |
|
``` |
|
|
|
|