Text Generation
Transformers
Safetensors
Japanese
English
qwen2
conversational
text-generation-inference
File size: 5,747 Bytes
463412a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
license: apache-2.0
datasets:
- mc4
- wikipedia
- EleutherAI/pile
- oscar-corpus/colossal-oscar-1.0
- cc100
language:
- ja
- en
tags:
- qwen2
inference: false
base_model: Qwen/Qwen2.5-32B
pipeline_tag: text-generation
library_name: transformers
---

# `Qwen2.5 Bakeneko 32B (rinna/qwen2.5-bakeneko-32b)`

![rinna-icon](./rinna.png)

# Overview

We conduct continual pre-training of [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) on **18B** tokens from a mixture of Japanese and English datasets. The continual pre-training improves the model's performance on Japanese tasks.

The name `bakeneko` comes from the Japanese word [`εŒ–γ‘ηŒ«/ばけねこ/Bakeneko`](https://ja.wikipedia.org/wiki/εŒ–γ‘ηŒ«), which is a kind of Japanese mythical creature ([`ε¦–ζ€ͺ/γ‚ˆγ†γ‹γ„/Youkai`](https://ja.wikipedia.org/wiki/%E5%A6%96%E6%80%AA)).

| Size | Continual Pre-Training | Instruction-Tuning | DeepSeek-R1-Distilled
| :-   | :-                     | :-                 | :-
| 32B   | Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b) | Qwen2.5 Bakeneko 32B Instruct [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct)[[AWQ]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-awq)[[GGUF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int4)| DeepSeek R1 Distill Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b)[[AWQ]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-awq)[[GGUF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gguf)[[GPTQ int8]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int4)

* **Library**

    The model was trained using code based on [Lightning-AI/litgpt](https://github.com/Lightning-AI/litgpt).

* **Model architecture**

    A 64-layer, 5120-hidden-size transformer-based language model. Please refer to the [Qwen2.5 Technical Report](https://arxiv.org/abs/2412.15115) for detailed information on the model's architecture.

* **Training**

    The model was initialized with the [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) model and continually trained on around **18B** tokens from a mixture of the following corpora
    - [Japanese CC-100](https://huggingface.co/datasets/cc100)
    - [Japanese C4](https://huggingface.co/datasets/mc4)
    - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
    - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
    - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
    - rinna curated Japanese dataset
  
* **Contributors**
    - [Toshiaki Wakatsuki](https://huggingface.co/t-w)
    - [Xinqi Chen](https://huggingface.co/Keely0419)
    - [Kei Sawada](https://huggingface.co/keisawada)

---

# Benchmarking

Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).

---

# Tokenization
The model uses the original [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) tokenizer.

---

# How to cite
```bibtex
@misc{rinna-qwen2.5-bakeneko-32b,
    title = {rinna/qwen2.5-bakeneko-32b},
    author = {Wakatsuki, Toshiaki and Chen, Xinqi and Sawada, Kei},
    url = {https://huggingface.co/rinna/qwen2.5-bakeneko-32b}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}
```
---

# References
```bibtex
@misc{qwen2.5,
    title = {Qwen2.5: A Party of Foundation Models},
    url = {https://qwenlm.github.io/blog/qwen2.5/},
    author = {Qwen Team},
    month = {September},
    year = {2024}
}

@article{qwen2,
    title = {Qwen2 Technical Report}, 
    author = {An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
    journal = {arXiv preprint arXiv:2407.10671},
    year = {2024}
}

@misc{litgpt-2023,
    author = {Lightning AI},
    title = {LitGPT},
    howpublished = {\url{https://github.com/Lightning-AI/litgpt}},
    year = {2023}
}
```
---

# License
[The Apache License, Version 2.0](https://opensource.org/license/apache-2-0)