Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,97 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model:
|
4 |
+
- cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese
|
5 |
+
datasets:
|
6 |
+
- HuggingFaceH4/ultrachat_200k
|
7 |
+
---
|
8 |
+
|
9 |
+
# TinyDeepSeek-JP-1.5B
|
10 |
+
|
11 |
+
本モデルは, DeepSeek-R1の小型蒸留モデルに日本語を追加学習した[cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese](https://huggingface.co/cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese)に対し、
|
12 |
+
SakanaAI社が提案した新たな蒸留手法TAIDを適用して小型化したものです.
|
13 |
+
|
14 |
+
Teacher model : [cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese](https://huggingface.co/cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese)
|
15 |
+
Student model : [SakanaAI/TinySwallow-1.5B-Instruct](https://huggingface.co/SakanaAI/TinySwallow-1.5B-Instruct)
|
16 |
+
|
17 |
+
|
18 |
+
### Uses
|
19 |
+
Uses follow the original models.
|
20 |
+
This model is provided for research and development purposes only and should be considered as an experimental prototype. It is not intended for commercial use or deployment in mission-critical environments. Use of this model is at the user's own risk, and its performance and outcomes are not guaranteed. EQUES Inc. shall not be liable for any direct, indirect, special, incidental, or consequential damages, or any loss arising from the use of this model, regardless of the results obtained. Users must fully understand the risks associated with the use of this model and use it at their own discretion.
|
21 |
+
|
22 |
+
|
23 |
+
### Output Examples
|
24 |
+
|
25 |
+
<details><summary>Give me a short introduction to large language model.</summary>
|
26 |
+
|
27 |
+
```
|
28 |
+
|
29 |
+
```
|
30 |
+
|
31 |
+
</details>
|
32 |
+
|
33 |
+
|
34 |
+
<details><summary>大規模言語モデルについて教えて。</summary>
|
35 |
+
|
36 |
+
```
|
37 |
+
|
38 |
+
```
|
39 |
+
|
40 |
+
</details>
|
41 |
+
|
42 |
+
<details><summary>A regular hexagon can be divided into six equilateral triangles. If the perimeter of one of the triangles is 21 inches, what is the perimeter, in inches, of the regular hexagon?</summary>
|
43 |
+
|
44 |
+
```
|
45 |
+
|
46 |
+
```
|
47 |
+
|
48 |
+
</details>
|
49 |
+
|
50 |
+
|
51 |
+
|
52 |
+
### Sample Usage
|
53 |
+
```python
|
54 |
+
import os
|
55 |
+
os.environ["CUDA_VISIBLE_DEVICES"]="0"
|
56 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
57 |
+
|
58 |
+
model_name = "EQUES/TinyDeepSeek-JP-1.5B"
|
59 |
+
model = AutoModelForCausalLM.from_pretrained(
|
60 |
+
model_name,
|
61 |
+
device_map="auto"
|
62 |
+
)
|
63 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
64 |
+
|
65 |
+
prompt = "大規模言語モデルについて教えて。"
|
66 |
+
messages = [
|
67 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
68 |
+
{"role": "user", "content": prompt}
|
69 |
+
]
|
70 |
+
text = tokenizer.apply_chat_template(
|
71 |
+
messages,
|
72 |
+
tokenize=False,
|
73 |
+
add_generation_prompt=True
|
74 |
+
)
|
75 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
76 |
+
|
77 |
+
generated_ids = model.generate(
|
78 |
+
**model_inputs,
|
79 |
+
max_new_tokens=512,
|
80 |
+
)
|
81 |
+
generated_ids = [
|
82 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
83 |
+
]
|
84 |
+
|
85 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
86 |
+
print(response)
|
87 |
+
```
|
88 |
+
|
89 |
+
### License
|
90 |
+
Apache-2.0
|
91 |
+
|
92 |
+
|
93 |
+
### Acknowledgement
|
94 |
+
- SakanaAI & Swallow team : development and release of TinySwallow-1.5B
|
95 |
+
- SakanaAI : development of TAID
|
96 |
+
- CyberAgent : development of DeepSeek-R1-Distill-Qwen-14B-Japanese
|
97 |
+
|