File size: 4,370 Bytes
afd3e27
 
 
59474cd
 
 
 
 
 
afd3e27
 
 
 
59474cd
 
 
 
afd3e27
 
59474cd
 
 
 
 
 
 
 
 
 
afd3e27
59474cd
afd3e27
59474cd
afd3e27
59474cd
 
afd3e27
59474cd
 
 
 
 
 
afd3e27
59474cd
 
 
 
 
 
 
 
 
 
 
afd3e27
59474cd
 
 
 
 
 
 
 
 
 
1601d6b
59474cd
afd3e27
59474cd
 
 
 
 
afd3e27
59474cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
afd3e27
 
 
 
 
 
 
 
 
 
59474cd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
base_model:
- cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
- karakuri-ai/karakuri-lm-32b-thinking-2501-exp
- Saxo/Linkbricks-Horizon-AI-Japanese-Base-32B
- FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
- TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- NovaSky-AI/Sky-T1-32B-Flash
library_name: transformers
tags:
- mergekit
- merge
license: apache-2.0
language:
- en
- ja
---

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/65f01b5235c5424c262c8be8/CxkLHJy9597WodmOOlWwc.jpeg)

## 概要
このモデルは[nitky/RoguePlanet-DeepSeek-R1-Qwen-32B](https://huggingface.co/nitky/RoguePlanet-DeepSeek-R1-Qwen-32B)にインスパイアを受け、作成されたモデルです。
<think></tnink>タグが出力されることは確認しています。
日本語モデルとしてもよい性能を出せることも確認しています。

## How To Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "DataPilot/SKYCAVE-R1-32B-v0.1"

tokenizer_name = ""

if tokenizer_name == "":
    tokenizer_name = model_name

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

prompt = "メタデータを解析し、自己進化をするAIであるnurture intelligenceが実現した未来の日常生活の姿を教えてください。"
messages = [
    {"role": "system", "content": "あなたは優秀な日本語アシスタントであり長考モデルです。問題解決をするための思考をした上で回答を行ってください。"},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=4096
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)
```

## 謝辞
このモデルの作成者皆様と、計算資源を貸していただいたVOLTMINDに感謝します。
モデル作成にアドバイスをしていただいたnitkyさんにも感謝申し上げます。

## mergekit config
```yaml
merge_method: slerp
base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
models:
  - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
  - model: Saxo/Linkbricks-Horizon-AI-Japanese-Base-32B
parameters:
  t: 0.35
dtype: bfloat16
name: SKYCAVE_element_QwQ_jp

---

merge_method: slerp
base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
models:
  - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
  - model: SKYCAVE_element_QwQ_jp
parameters:
  t: 0.4
dtype: bfloat16
name: SKYCAVE_element_QR_jp

---

merge_method: slerp
base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
models:
  - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
  - model: FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
parameters:
  t: 0.5
dtype: bfloat16
name: SKYCAVE_element_R1_jp_01

---

merge_method: slerp
base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
models:
  - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
  - model: TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
parameters:
  t: 0.5
dtype: bfloat16
name: SKYCAVE_element_R1_jp_02

---

merge_method: slerp
base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
models:
  - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
parameters:
  t: 0.6
dtype: bfloat16
name: SKYCAVE_element_R1_jp_03

---

merge_method: slerp
base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
models:
  - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
  - model: NovaSky-AI/Sky-T1-32B-Flash
parameters:
  t: 0.4
dtype: bfloat16
name: SKYCAVE_element_Sky_jp

---

merge_method: model_stock
base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
models:
  - model: SKYCAVE_element_QR_jp
  - model: SKYCAVE_element_R1_jp_01
  - model: SKYCAVE_element_R1_jp_02
  - model: SKYCAVE_element_R1_jp_03
  - model: SKYCAVE_element_Sky_jp
dtype: bfloat16
name: SKYCAVE-R1-32B-v0.1
```