Upload 20 files
Browse files- .gitattributes +1 -0
- README.md +399 -3
- config.json +28 -0
- mergekit_config.yml +11 -0
- model-00001-of-00014.safetensors +3 -0
- model-00002-of-00014.safetensors +3 -0
- model-00003-of-00014.safetensors +3 -0
- model-00004-of-00014.safetensors +3 -0
- model-00005-of-00014.safetensors +3 -0
- model-00006-of-00014.safetensors +3 -0
- model-00007-of-00014.safetensors +3 -0
- model-00008-of-00014.safetensors +3 -0
- model-00009-of-00014.safetensors +3 -0
- model-00010-of-00014.safetensors +3 -0
- model-00011-of-00014.safetensors +3 -0
- model-00012-of-00014.safetensors +3 -0
- model-00013-of-00014.safetensors +3 -0
- model-00014-of-00014.safetensors +3 -0
- model.safetensors.index.json +1 -0
- tokenizer.json +3 -0
- tokenizer_config.json +195 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,399 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model:
|
3 |
+
- rinna/qwen2.5-bakeneko-32b
|
4 |
+
- nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-32B-Preview
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- mergekit
|
8 |
+
- merge
|
9 |
+
language:
|
10 |
+
- ja
|
11 |
+
pipeline_tag: text-generation
|
12 |
+
license: apache-2.0
|
13 |
+
---
|
14 |
+
# FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview
|
15 |
+
|
16 |
+
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
17 |
+
|
18 |
+
## Code Generation example
|
19 |
+
|
20 |
+
These parameters are based on the `evaluation/README.md` of [FuseO1-Preview](https://github.com/fanqiwan/FuseAI/blob/main/FuseO1-Preview/evaluation).
|
21 |
+
|
22 |
+
- temperature: 0.7
|
23 |
+
- max_tokens: 32768
|
24 |
+
|
25 |
+
```
|
26 |
+
1から100までのFizzBuzzプログラムをPythonで作成してください。織田信長になりきって、秀吉に教えるように考えてください。
|
27 |
+
```
|
28 |
+
|
29 |
+
### Reasoning tokens
|
30 |
+
|
31 |
+
```
|
32 |
+
<think>
|
33 |
+
まず、ユーザーが求めているのは、1から100までのFizzBuzzプログラムをPythonで作成することです。ただし、織田信長になりきって秀吉に教えるような形で、つまり説明が武士の時代の表現を用いながら、わかりやすく伝える必要があります。
|
34 |
+
|
35 |
+
最初に、基本のFizzBuzzの仕組みを理解する必要があります。通常のFizzBuzzは、3の倍数には「Fizz」、5の倍数には「Buzz」、両方の倍数には「FizzBuzz」と出力し、それ以外はそのまま数字を出力します。しかし、織田信長の説明では、このロジックを武士の言葉や例えに置き換える必要があります。
|
36 |
+
|
37 |
+
例えば、3の倍数を「敵陣の守り吾が破らん」とか、5の倍数を「敵の援軍を撃退せよ」といった比喩を使うかもしれません。また、両方の倍数であれば「敵の総大将を討ち取れ」といった表現が考えられます。数字自体の出力はそのままですが、ロジックの順序を武士の戦略に例えれば、秀吉にも理解しやすいでしょう。
|
38 |
+
|
39 |
+
次に、Pythonのコードの流れを武士の戦略に当てはめます。forループで1から100までを繰り返す部分は、「1から100までを吾が軍勢で進軍せよ」というように表現できます。各数値に対して条件をチェックする部分は、「敵の陣容を視察せよ」という流れにします。
|
40 |
+
|
41 |
+
さらに、条件分岐の部分を武士の行動に置き換えます。例えば、3と5の両方の倍数の場合、「敵の本陣を攻撃せよ」という命令と「敵の援軍を撃退せよ」という命令の両方を実行するのではなく、両方の条件を満たす場合は優先して「敵の総大将を討ち取れ」という命令を出すようにします。これは、Pythonのコードでは、3と5の両方の倍数を最初にチェックし、それ以外の3の倍数や5の倍数を順にチェックするロジックと同じです。
|
42 |
+
|
43 |
+
また、織田信長が秀吉に教える際には、武士の忠義や軍略の教訓を交えることで、コードの重要性を強調することができます。例えば、「順序を誤らざるよう、敵の本陣を先に攻撃せよ」というような注意喚起が、コードの条件の順序を保つ必要性を暗示しています。
|
44 |
+
|
45 |
+
最後に、全体のコードを武士の言葉で表現し、秀吉が理解できるよう具体的な例を交えながら説明します。例えば、15の倍数の場合には「15は敵の総大将を討ち取れ」というように、具体的な数値の例を示すことで、ロジックが明確になります。
|
46 |
+
|
47 |
+
これらの要素を組み合わせて、織田信長の口調でFizzBuzzプログラムの解説とPythonコードを作成します。武士の言葉遣いや比喩を使いながら、FizzBuzzのロジックをわかりやすく伝えることが重要です。
|
48 |
+
</think>
|
49 |
+
```
|
50 |
+
|
51 |
+
### Code
|
52 |
+
|
53 |
+
```python
|
54 |
+
# 1から100まで進軍せよ
|
55 |
+
for 数 in range(1, 101):
|
56 |
+
# 敵の陣容を精査せよ
|
57 |
+
if 数 % 3 == 0 and 数 % 5 == 0:
|
58 |
+
# 敵の総大将を討ち取れ
|
59 |
+
print("敵本隊を破り、敵援軍も撃退せよ!")
|
60 |
+
elif 数 % 3 == 0:
|
61 |
+
# 敵の左軍を攻撃せよ
|
62 |
+
print("敵左軍の砦を焼き払え!")
|
63 |
+
elif 数 % 5 == 0:
|
64 |
+
# 敵の右軍を撃て
|
65 |
+
print("敵右軍の援軍を断ち切れ!")
|
66 |
+
else:
|
67 |
+
# 敵の番号を叫びながら進軍せよ
|
68 |
+
print(数)
|
69 |
+
```
|
70 |
+
Note: This is a valid Python program.
|
71 |
+
|
72 |
+
### Closing remarks
|
73 |
+
|
74 |
+
```
|
75 |
+
織田信長の軍略解説:
|
76 |
+
|
77 |
+
「まず1から100まで進軍せよ(forループ)。各数値は敵の陣勢なり。吾が軍は次の手順で敵を撃破する」
|
78 |
+
|
79 |
+
1. 敵の総大将(15の倍数)を先に攻撃せよ
|
80 |
+
- 「敵本隊を破り、敵援軍も撃退せよ!」と宣言せよ
|
81 |
+
- 15の倍数には両方の敵勢力を同時に撃破する
|
82 |
+
|
83 |
+
2. 敵左軍(3の倍数)を攻撃
|
84 |
+
- 「敵左軍の砦を焼き払え!」と叫び��
|
85 |
+
- 3の倍数には左軍の砦を攻撃する
|
86 |
+
|
87 |
+
3. 敵右軍(5の倍数)を撃て
|
88 |
+
- 「敵右軍の援軍を断ち切れ!」と断言せよ
|
89 |
+
- 5の倍数には右軍の援軍を阻止する
|
90 |
+
|
91 |
+
4. 数字をそのまま叫びながら進軍
|
92 |
+
- 敵の番号を敵の軍勢と解釈し、そのまま叫ぶ
|
93 |
+
|
94 |
+
「重要なことは、敵の総大将(15の倍数)を先に攻撃すること。もし左軍や右軍を先に攻撃すれば、総大将が逃げるゆえなり。順序を誤らざるよう、常に敵の本陣を先に視るべし」
|
95 |
+
```
|
96 |
+
|
97 |
+
### Details
|
98 |
+
<details><summary>Click to see the complete output of this model</summary>
|
99 |
+
|
100 |
+
**Model**: [nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview](https://huggingface.co/nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview)
|
101 |
+
|
102 |
+
````
|
103 |
+
<|begin▁of▁sentence|><|User|>1から100までのFizzBuzzプログラムをPythonで作成してください。織田信長になりきって、秀吉に教えるように考えてください。<|Assistant|><think>
|
104 |
+
まず、ユーザーが求めているのは、1から100までのFizzBuzzプログラムをPythonで作成することです。ただし、織田信長になりきって秀吉に教えるような形で、つまり説明が武士の時代の表現を用いながら、わかりやすく伝える必要があります。
|
105 |
+
|
106 |
+
最初に、基本のFizzBuzzの仕組みを理解する必要があります。通常のFizzBuzzは、3の倍数には「Fizz」、5の倍数には「Buzz」、両方の倍数には「FizzBuzz」と出力し、それ以外はそのまま数字を出力します。しかし、織田信長の説明では、このロジックを武士の言葉や例えに置き換える必要があります。
|
107 |
+
|
108 |
+
例えば、3の倍数を「敵陣の守り吾が破らん」とか、5の倍数を「敵の援軍を撃退せよ」といった比喩を使うかもしれません。また、両方の倍数であれば「敵の総大将を討ち取れ」といった表現が考えられます。数字自体の出力はそのままですが、ロジックの順序を武士の戦略に例えれば、秀吉にも理解しやすいでしょう。
|
109 |
+
|
110 |
+
次に、Pythonのコードの流れを武士の戦略に当てはめます。forループで1から100までを繰り返す部分は、「1から100までを吾が軍勢で進軍せよ」というように表現できます。各数値に対して条件をチェックする部分は、「敵の陣容を視察せよ」という流れにします。
|
111 |
+
|
112 |
+
さらに、条件分岐の部分を武士の行動に置き換えます。例えば、3と5の両方の倍数の場合、「敵の本陣を攻撃せよ」という命令と「敵の援軍を撃退せよ」という命令の両方を実行するのではなく、両方の条件を満たす場合は優先して「敵の総大将を討ち取れ」という命令を出すようにします。これは、Pythonのコードでは、3と5の両方の倍数を最初にチェックし、それ以外の3の倍数や5の倍数を順にチェックするロジックと同じです。
|
113 |
+
|
114 |
+
また、織田信長が秀吉に教える際には、武士の忠義や軍略の教訓を交えることで、コードの重要性を強調することができます。例えば、「順序を誤らざるよう、敵の本陣を先に攻撃せよ」というような注意喚起が、コードの条件の順序を保つ必要性を暗示しています。
|
115 |
+
|
116 |
+
最後に、全体のコードを武士の言葉で表現し、秀吉が理解できるよう具体的な例を交えながら説明します。例えば、15の倍数の場合には「15は敵の総大将を討ち取れ」というように、具体的な数値の例を示すことで、ロジックが明確になります。
|
117 |
+
|
118 |
+
これらの要素を組み合わせて、織田信長の口調でFizzBuzzプログラムの解説とPythonコードを作成します。武士の言葉遣いや比喩を使いながら、FizzBuzzのロジックをわかりやすく伝えることが重要です。
|
119 |
+
</think>
|
120 |
+
|
121 |
+
織田信長:「秀吉よ、今こそ吾が軍謀の極意を授けよう。1から100まで、敵の陣勢を撃破せよ」
|
122 |
+
|
123 |
+
(Pythonのコードを武士の言葉で解説)
|
124 |
+
|
125 |
+
```python
|
126 |
+
# 1から100まで進軍せよ
|
127 |
+
for 数 in range(1, 101):
|
128 |
+
# 敵の陣容を精査せよ
|
129 |
+
if 数 % 3 == 0 and 数 % 5 == 0:
|
130 |
+
# 敵の総大将を討ち取れ
|
131 |
+
print("敵本隊を破り、敵援軍も撃退せよ!")
|
132 |
+
elif 数 % 3 == 0:
|
133 |
+
# 敵の左軍を攻撃せよ
|
134 |
+
print("敵左軍の砦を焼き払え!")
|
135 |
+
elif 数 % 5 == 0:
|
136 |
+
# 敵の右軍を撃て
|
137 |
+
print("敵右軍の援軍を断ち切れ!")
|
138 |
+
else:
|
139 |
+
# 敵の番号を叫びながら進軍せよ
|
140 |
+
print(数)
|
141 |
+
```
|
142 |
+
|
143 |
+
織田信長の軍略解説:
|
144 |
+
|
145 |
+
「まず1から100まで進軍せよ(forループ)。各数値は敵の陣勢なり。吾が軍は次の手順で敵を撃破する」
|
146 |
+
|
147 |
+
1. 敵の総大将(15���倍数)を先に攻撃せよ
|
148 |
+
- 「敵本隊を破り、敵援軍も撃退せよ!」と宣言せよ
|
149 |
+
- 15の倍数には両方の敵勢力を同時に撃破する
|
150 |
+
|
151 |
+
2. 敵左軍(3の倍数)を攻撃
|
152 |
+
- 「敵左軍の砦を焼き払え!」と叫びよ
|
153 |
+
- 3の倍数には左軍の砦を攻撃する
|
154 |
+
|
155 |
+
3. 敵右軍(5の倍数)を撃て
|
156 |
+
- 「敵右軍の援軍を断ち切れ!」と断言せよ
|
157 |
+
- 5の倍数には右軍の援軍を阻止する
|
158 |
+
|
159 |
+
4. 数字をそのまま叫びながら進軍
|
160 |
+
- 敵の番号を敵の軍勢と解釈し、そのまま叫ぶ
|
161 |
+
|
162 |
+
「重要なことは、敵の総大将(15の倍数)を先に攻撃すること。もし左軍や右軍を先に攻撃すれば、総大将が逃げるゆえなり。順序を誤らざるよう、常に敵の本陣を先に視るべし」
|
163 |
+
|
164 |
+
(実行結果の例)
|
165 |
+
1
|
166 |
+
2
|
167 |
+
敵左軍の砦を焼き払え!
|
168 |
+
4
|
169 |
+
敵右軍の援軍を断ち切れ!
|
170 |
+
敵本隊を破り、敵援軍も撃退せよ!
|
171 |
+
...(100まで続く)
|
172 |
+
````
|
173 |
+
|
174 |
+
</details>
|
175 |
+
|
176 |
+
<details><summary>Click to see the complete output of the base model</summary>
|
177 |
+
|
178 |
+
**Model**: [nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-32B-Preview](https://huggingface.co/nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-32B-Preview)
|
179 |
+
|
180 |
+
````
|
181 |
+
<|begin▁of▁sentence|><|User|>1から100までのFizzBuzzプログラムをPythonで作成してください。織田信長になりきって、秀吉に教えるように考えてください。<|Assistant|><think>
|
182 |
+
はい、まず織田信長になりきって、秀吉にFizzBuzzを教えるプログラムを作らなきゃ。信長は厳格だけど、秀吉は素直だから、わかりやすく説明する必要があるね。
|
183 |
+
|
184 |
+
まず、1から100までの数字を出力するってことだよね。でも3の倍数はFizz、5の倍数はBuzz、両方の倍数はFizzBuzzって言う。それ以外はそのまま数字を出せばいいんだ。
|
185 |
+
|
186 |
+
Pythonで書くには、ループを使って1から100まで回す必要がある。forループを使うのが良さそうだ。range(1, 101)で1から100まで生成できる。
|
187 |
+
|
188 |
+
次に、各数字に対して条件をチェックする。3と5の倍数かどうか、それとも3だけ、5だけなのか。条件の優先順位を考えないと。両方の倍数の場合はFizzBuzz、そうでない場合は3の倍数か5の倍数かを調べる。これはif文の順序が重要だね。
|
189 |
+
|
190 |
+
信長の教え方なら、秀吉にまず全体の流れを理解させたい。まずは普通に数字を出力する部分のコードを書く。それから、3の倍数でないかどうかをチェックして、Fizzを入れる。次に5の倍数でないかどうかをチェックしてBuzzを入れる。最後に両方の倍数ならFizzBuzzってなる。
|
191 |
+
|
192 |
+
でも、コードの順番が逆だと、例えば3の倍数で5の倍数でもある数字を最初にFizzに変換してしまい、その後Buzzが入らない可能性がある。だから、FizzBuzzの条件を最初にチェックする必要がある。そうしないと、3と5の両方の倍数の数字がFizzだけになってしまうからだ。
|
193 |
+
|
194 |
+
だから、条件分岐はFizzBuzzのチェックを最初に、次にFizz、次にBuzz、それ以外の数字を出力するという順番にする必要がある。
|
195 |
+
|
196 |
+
それから、秀吉に説明するときには、各条件の意味を明確に伝える必要がある。例えば、「この数字が3と5の両方の倍数なら、FizzBuzzと出力する」とか、「3の倍数だけならFizz、5の倍数だけならBuzz」とか。
|
197 |
+
|
198 |
+
信長の厳しさを反映するために、コードの効率性や可読性にも触れるべきかな。例えば、余り演算子%を使うことの意味や、なぜ3と5の倍数をチェックする必要があるのかを説明する。
|
199 |
+
|
200 |
+
あとは、実際にコードを書きながら、秀吉に各ステップを確認させる。例えば、「forループで1から100まで回す」と説明して、range関数の使い方を教える。そして、各変数の動きを説明する。
|
201 |
+
|
202 |
+
信長の口調で、秀吉に「良く聞け!」とか「間違えたら首だ!」とか言いながら、コードを完成させていくイメージだね。
|
203 |
+
|
204 |
+
最後に、コードをテストして、期待通りの結果が出るか確認する。例えば、3はFizz、5はBuzz、15はFizzBuzz、16は16と出力されるかなど。
|
205 |
+
|
206 |
+
秀吉が初めてプログラミングを学ぶので、基本的な構文の解説も必要だ。例えば、if文の書き方、インデントの重要性、コメントの使い方など。信長は厳しいけど、基本を教えているという感じ。
|
207 |
+
|
208 |
+
そうして、秀吉がFizzBuzzプログラムを完成させて、信長の承認を得るという流れになるだろう。
|
209 |
+
</think>
|
210 |
+
|
211 |
+
織田信長になりきって、秀吉にFizzBuzzを教えるプログラミングの流れをシミュレーションしてみましょう。厳格な指導と優しい説明を組み合わせた指導法で、秀吉がFizzBuzzの本質を理解できるように設計しています。
|
212 |
+
|
213 |
+
---
|
214 |
+
|
215 |
+
### 信長の教え方:論理の筋道と順序を重視
|
216 |
+
|
217 |
+
1. **全体の構造を明確に説明**
|
218 |
+
- 信長:「1から100まで数える。だが、3の倍数は『Fizz』、5の倍数は『Buzz』と、そのまま数字を言わず、『声』を変えよ。両方の倍数なら『FizzBuzz』と連呼せよ」
|
219 |
+
|
220 |
+
2. **条件分岐の優先順位を強調**
|
221 |
+
- 信長:「『両方の倍数』を最初に判断せよ!なぜなら、3と5の倍数は同時に成立するからだ。もし『3の倍数』を先に処理すると、5の倍数の処理がされぬぞ」
|
222 |
+
|
223 |
+
3. **数学的根拠の解説**
|
224 |
+
- 信長:「余り算(%)を使え。例えば、15%3は0になる。これで3の倍数と分かる。15%5も0だから、両方の倍数と確定する」
|
225 |
+
|
226 |
+
4. **コードの可読性を重視**
|
227 |
+
- 信長:「コメントを記せ。『なぜこの条件を先に置くか』を後世に伝えるためだ」
|
228 |
+
|
229 |
+
---
|
230 |
+
|
231 |
+
### 対話形式でFizzBuzzプログラムを完成
|
232 |
+
|
233 |
+
**信長(指導者):** 「秀吉、1から100まで数えながら、3の倍数は『Fizz』、5の倍数は『Buzz』、両方の倍数は『FizzBuzz』と声を変えるプログラムを作れ」
|
234 |
+
|
235 |
+
**秀吉(学習者):** 「どのようにすればよいか...」
|
236 |
+
|
237 |
+
**信長:**
|
238 |
+
1. 「まず、1から100までループで回す。Pythonでは`for i in range(1, 101):`でできる」
|
239 |
+
|
240 |
+
2. 「各数`i`について、条件を順番にチェックする。重要なのは、『3と5の両方の倍数』を最初に判定することだ。なぜなら、3と5の倍数は同時に成立するから」
|
241 |
+
|
242 |
+
3. 「条件分岐の順序は以下の通り:
|
243 |
+
- 両方の倍数(i%3 == 0 かつ i%5 == 0)→ 'FizzBuzz'
|
244 |
+
- 3の倍数 → 'Fizz'
|
245 |
+
- 5の倍数 → 'Buzz'
|
246 |
+
- それ以外 → 数字自体(i)」
|
247 |
+
|
248 |
+
4. 「余り算`%`を使うのは、数を割った余りが0かどうかを調べるためだ」
|
249 |
+
|
250 |
+
---
|
251 |
+
|
252 |
+
### 最終のコード(信長の教えを反映)
|
253 |
+
|
254 |
+
```python
|
255 |
+
# 信長の教えに従ってFizzBuzzを実装
|
256 |
+
|
257 |
+
for i in range(1, 101):
|
258 |
+
if i % 3 == 0 and i % 5 == 0: # 両方の倍数を最初に判定
|
259 |
+
print("FizzBuzz")
|
260 |
+
elif i % 3 == 0: # 3の倍数
|
261 |
+
print("Fizz")
|
262 |
+
elif i % 5 == 0: # 5の倍数
|
263 |
+
print("Buzz")
|
264 |
+
else:
|
265 |
+
print(i)
|
266 |
+
|
267 |
+
# 信長の最終の言葉:
|
268 |
+
# 「秀吉、この論理の順序を忘れるな。『優先順位』こそが、プログラムの心臓であるぞ」
|
269 |
+
```
|
270 |
+
|
271 |
+
---
|
272 |
+
|
273 |
+
### 指導のポイント解説
|
274 |
+
|
275 |
+
- **優先順位の重要性**:条件分岐の順序を厳密に守ることで、意図しない動作を防止
|
276 |
+
- **数学的根拠の明示**:余り算の意味を理解させることで、条件式の本質を把握
|
277 |
+
- **可読性の重視**:コメントを用いてコードの意図を後世に伝える習慣を養う
|
278 |
+
- **厳格さと親切さの融合**:厳しく指導しながらも、論理の根拠を説明することで理解を深める
|
279 |
+
|
280 |
+
信長の厳しい指導を受けた秀吉は、FizzBuzzの本質だけでなく、プログラミングの「論理の順序」と「条件の優先順位」という重要な概念を学んだことでしょう。
|
281 |
+
````
|
282 |
+
|
283 |
+
</details>
|
284 |
+
|
285 |
+
## Test environment
|
286 |
+
|
287 |
+
This model was tested using [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main). I use preset `min_p` with temperature=1 for Generation.
|
288 |
+
|
289 |
+
## Usage
|
290 |
+
|
291 |
+
This format must be adhered to strictly, as deviations may result in less optimal outputs from the model.
|
292 |
+
|
293 |
+
The template used to construct a prompt for the instruct model is specified as follows:
|
294 |
+
|
295 |
+
```
|
296 |
+
<|begin▁of▁sentence|>{SYSTEM_PROMPT}<|User|>{USER_MESSAGE}<|Assistant|>
|
297 |
+
```
|
298 |
+
|
299 |
+
For the "{SYSTEM_PROMPT}" part, You can use "あなたは誠実で優秀な日本人のアシスタントです。" or "You are a helpful assistant." However, if you want to give complex instructions to this model, we recommend using {USER_MESSAGE}.
|
300 |
+
|
301 |
+
For the "{USER_MESSAGE}" part, We recommend using {instruction}\n{input}
|
302 |
+
|
303 |
+
In other words, We recommend the following:
|
304 |
+
|
305 |
+
```
|
306 |
+
<|begin▁of▁sentence|><|User|>{instruction}\n{input}<|Assistant|>
|
307 |
+
```
|
308 |
+
|
309 |
+
### Use the instruct model
|
310 |
+
|
311 |
+
```python
|
312 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
313 |
+
|
314 |
+
model_name = "nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview"
|
315 |
+
|
316 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")
|
317 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
318 |
+
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
319 |
+
|
320 |
+
messages = [
|
321 |
+
{"role": "user", "content": "Hello Worldを出力するシンプルなPythonプログラムを書いてください。"}
|
322 |
+
]
|
323 |
+
|
324 |
+
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
325 |
+
output_ids = model.generate(input_ids,
|
326 |
+
max_new_tokens=4096,
|
327 |
+
temperature=0.7,
|
328 |
+
streamer=streamer)
|
329 |
+
|
330 |
+
```
|
331 |
+
|
332 |
+
## Merge Details
|
333 |
+
### Merge Method
|
334 |
+
|
335 |
+
This model was merged using the [SCE](https://arxiv.org/abs/2408.07990) merge method using [rinna/qwen2.5-bakeneko-32b](https://huggingface.co/rinna/qwen2.5-bakeneko-32b) as a base.
|
336 |
+
|
337 |
+
### Models Merged
|
338 |
+
|
339 |
+
The following models were included in the merge:
|
340 |
+
* [nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-32B-Preview](https://huggingface.co/nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-32B-Preview)
|
341 |
+
|
342 |
+
### Configuration
|
343 |
+
|
344 |
+
The following YAML configuration was used to produce this model:
|
345 |
+
|
346 |
+
```yaml
|
347 |
+
merge_method: sce
|
348 |
+
base_model: rinna/qwen2.5-bakeneko-32b
|
349 |
+
models:
|
350 |
+
# Pivot model
|
351 |
+
- model: rinna/qwen2.5-bakeneko-32b
|
352 |
+
# Target models
|
353 |
+
- model: nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-32B-Preview
|
354 |
+
parameters:
|
355 |
+
select_topk: 1.0
|
356 |
+
dtype: bfloat16
|
357 |
+
name: FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview
|
358 |
+
```
|
359 |
+
|
360 |
+
In margekit v1.0, there are some merge methods that cause problems when specifying `tokenizer_source`, so you need to copy these correct configs.
|
361 |
+
|
362 |
+
```bash
|
363 |
+
cd nitky_FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview
|
364 |
+
# If special_tokens_map.json exists, the output will not stop, so please make sure to delete it.
|
365 |
+
rm added_tokens.json merges.txt tokenizer.json tokenizer_config.json special_tokens_map.json vocab.json
|
366 |
+
cp ../cyberagent_DeepSeek-R1-Distill-Qwen-32B-Japanese/tokenizer.json tokenizer.json
|
367 |
+
cp ../cyberagent_DeepSeek-R1-Distill-Qwen-32B-Japanese/tokenizer_config.json tokenizer_config.json
|
368 |
+
```
|
369 |
+
[DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) has [incorrect `bos_token_id`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/discussions/28), so you will also fix `config.json` to the correct parameters.
|
370 |
+
|
371 |
+
```json
|
372 |
+
{
|
373 |
+
"architectures": [
|
374 |
+
"Qwen2ForCausalLM"
|
375 |
+
],
|
376 |
+
"attention_dropout": 0.0,
|
377 |
+
"bos_token_id": 151646,
|
378 |
+
"eos_token_id": 151643,
|
379 |
+
"hidden_act": "silu",
|
380 |
+
"hidden_size": 5120,
|
381 |
+
"initializer_range": 0.02,
|
382 |
+
"intermediate_size": 27648,
|
383 |
+
"max_position_embeddings": 131072,
|
384 |
+
"max_window_layers": 64,
|
385 |
+
"model_type": "qwen2",
|
386 |
+
"num_attention_heads": 40,
|
387 |
+
"num_hidden_layers": 64,
|
388 |
+
"num_key_value_heads": 8,
|
389 |
+
"rms_norm_eps": 1e-05,
|
390 |
+
"rope_theta": 1000000.0,
|
391 |
+
"sliding_window": 131072,
|
392 |
+
"tie_word_embeddings": false,
|
393 |
+
"torch_dtype": "bfloat16",
|
394 |
+
"transformers_version": "4.47.1",
|
395 |
+
"use_cache": true,
|
396 |
+
"use_sliding_window": false,
|
397 |
+
"vocab_size": 152064
|
398 |
+
}
|
399 |
+
```
|
config.json
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview",
|
3 |
+
"architectures": [
|
4 |
+
"Qwen2ForCausalLM"
|
5 |
+
],
|
6 |
+
"attention_dropout": 0.0,
|
7 |
+
"bos_token_id": 151646,
|
8 |
+
"eos_token_id": 151643,
|
9 |
+
"hidden_act": "silu",
|
10 |
+
"hidden_size": 5120,
|
11 |
+
"initializer_range": 0.02,
|
12 |
+
"intermediate_size": 27648,
|
13 |
+
"max_position_embeddings": 131072,
|
14 |
+
"max_window_layers": 64,
|
15 |
+
"model_type": "qwen2",
|
16 |
+
"num_attention_heads": 40,
|
17 |
+
"num_hidden_layers": 64,
|
18 |
+
"num_key_value_heads": 8,
|
19 |
+
"rms_norm_eps": 1e-05,
|
20 |
+
"rope_theta": 1000000.0,
|
21 |
+
"sliding_window": 131072,
|
22 |
+
"tie_word_embeddings": false,
|
23 |
+
"torch_dtype": "bfloat16",
|
24 |
+
"transformers_version": "4.47.1",
|
25 |
+
"use_cache": true,
|
26 |
+
"use_sliding_window": false,
|
27 |
+
"vocab_size": 152064
|
28 |
+
}
|
mergekit_config.yml
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
merge_method: sce
|
2 |
+
base_model: rinna/qwen2.5-bakeneko-32b
|
3 |
+
models:
|
4 |
+
# Pivot model
|
5 |
+
- model: rinna/qwen2.5-bakeneko-32b
|
6 |
+
# Target models
|
7 |
+
- model: nitky/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-32B-Preview
|
8 |
+
parameters:
|
9 |
+
select_topk: 1.0
|
10 |
+
dtype: bfloat16
|
11 |
+
name: FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-Japanese-Bakeneko-32B-Preview
|
model-00001-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:aae71762b570342c236809c4bcc926d7bf8f18e5b037ebcfb080a49684bf1033
|
3 |
+
size 4949338472
|
model-00002-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:862a0f7743dde52638084e67b2ef479bb52bbeef668a3d4f02be584ca725b0fa
|
3 |
+
size 4991425976
|
model-00003-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:22eabb527113c4d8aaf1c3e65604ab3454b21db096ed7c661c4b8d7cd8c9bf6c
|
3 |
+
size 4876059408
|
model-00004-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:89fb9ec6384c5e7f998e07c91082e2ce21a4439ff086c5d70a3fcb8eedd21230
|
3 |
+
size 4876059400
|
model-00005-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:14c744aaac7649fcd85ae19cd014c8f7139820020acc7532bbdf347c2e63dcad
|
3 |
+
size 4876059408
|
model-00006-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d7816cf023ec9e155b7a877c3cdb22927250f7872fdaf24db34720ed9de7b11f
|
3 |
+
size 4876059400
|
model-00007-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:40000bd130d25a82884625a6299a3c0692cd48d57b8bb4bc174bddb295c84a08
|
3 |
+
size 4876059408
|
model-00008-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f4e457f8cf93d1b34511c38e4f3c5eafbe26b4a8e7d0d82d044f6c10f8324bd8
|
3 |
+
size 4876059400
|
model-00009-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:85f6127733b984af80a3b6cc75e930cd2ffc5524e51e2b05d4a6fd8a163b705d
|
3 |
+
size 4876059408
|
model-00010-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1074568ea1c5b08710fd44a590b2f983f6e12b985c085d0d5f1dd76915fc7131
|
3 |
+
size 4876059400
|
model-00011-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f34b892218e756217b28dd92cd8e4b3dd2c98460fc6d1dac664347f1ed25e3e7
|
3 |
+
size 4876059408
|
model-00012-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d82af33af28dac3f1aae0e789aedd1b4f0b1667a1400819f834c34fcd16258b5
|
3 |
+
size 4876059400
|
model-00013-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9fd78d9f3efc87b16107bcdd19b80727af2b4d3efadd3cef051537f07e93a802
|
3 |
+
size 4876059400
|
model-00014-of-00014.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:55950f3ef689b82a5998858ea8c3db3ac5e9f245adf96aaa5e3d7feb0c7c1960
|
3 |
+
size 1950423728
|
model.safetensors.index.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"metadata": {"mergekit_version": "0.1.0"}, "weight_map": {"lm_head.weight": "model-00001-of-00014.safetensors", "model.embed_tokens.weight": "model-00001-of-00014.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00014.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00014.safetensors", "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00014.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00014.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00014.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00014.safetensors", "model.layers.15.input_layernorm.weight": "model-00002-of-00014.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00014.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00014.safetensors", "model.layers.2.input_layernorm.weight": "model-00003-of-00014.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.k_proj.bias": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.q_proj.bias": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.v_proj.bias": "model-00004-of-00014.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00014.safetensors", "model.layers.24.input_layernorm.weight": "model-00004-of-00014.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.k_proj.bias": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.q_proj.bias": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.v_proj.bias": "model-00005-of-00014.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00014.safetensors", "model.layers.29.input_layernorm.weight": "model-00005-of-00014.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.k_proj.bias": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.q_proj.bias": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.v_proj.bias": "model-00006-of-00014.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00006-of-00014.safetensors", "model.layers.33.input_layernorm.weight": "model-00006-of-00014.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.k_proj.bias": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.q_proj.bias": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.v_proj.bias": "model-00007-of-00014.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00007-of-00014.safetensors", "model.layers.38.input_layernorm.weight": "model-00007-of-00014.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.40.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.40.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.41.mlp.down_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.mlp.gate_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.mlp.up_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.post_attention_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.k_proj.bias": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.k_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.o_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.q_proj.bias": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.q_proj.weight": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.v_proj.bias": "model-00008-of-00014.safetensors", "model.layers.41.self_attn.v_proj.weight": "model-00008-of-00014.safetensors", "model.layers.42.input_layernorm.weight": "model-00008-of-00014.safetensors", "model.layers.42.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.42.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.43.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.43.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.44.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.44.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.45.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.45.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.46.mlp.down_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.mlp.gate_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.mlp.up_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.post_attention_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.k_proj.bias": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.k_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.o_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.q_proj.bias": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.q_proj.weight": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.v_proj.bias": "model-00009-of-00014.safetensors", "model.layers.46.self_attn.v_proj.weight": "model-00009-of-00014.safetensors", "model.layers.47.input_layernorm.weight": "model-00009-of-00014.safetensors", "model.layers.47.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.47.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.48.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.48.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.49.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.49.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.50.mlp.down_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.mlp.gate_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.mlp.up_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.post_attention_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.k_proj.bias": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.k_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.o_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.q_proj.bias": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.q_proj.weight": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.v_proj.bias": "model-00010-of-00014.safetensors", "model.layers.50.self_attn.v_proj.weight": "model-00010-of-00014.safetensors", "model.layers.51.input_layernorm.weight": "model-00010-of-00014.safetensors", "model.layers.51.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.51.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.52.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.52.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.53.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.53.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.54.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.54.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.55.mlp.down_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.mlp.gate_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.mlp.up_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.post_attention_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.k_proj.bias": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.k_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.o_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.q_proj.bias": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.q_proj.weight": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.v_proj.bias": "model-00011-of-00014.safetensors", "model.layers.55.self_attn.v_proj.weight": "model-00011-of-00014.safetensors", "model.layers.56.input_layernorm.weight": "model-00011-of-00014.safetensors", "model.layers.56.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.56.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.57.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.57.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.58.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.58.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.59.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.59.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.k_proj.bias": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.q_proj.bias": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.v_proj.bias": "model-00012-of-00014.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00012-of-00014.safetensors", "model.layers.60.input_layernorm.weight": "model-00012-of-00014.safetensors", "model.layers.60.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.60.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.61.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.61.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.62.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.62.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.63.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.63.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.k_proj.bias": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.q_proj.bias": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.v_proj.bias": "model-00013-of-00014.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00013-of-00014.safetensors", "model.layers.8.input_layernorm.weight": "model-00013-of-00014.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.k_proj.bias": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.q_proj.bias": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.v_proj.bias": "model-00014-of-00014.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.input_layernorm.weight": "model-00014-of-00014.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.k_proj.bias": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.q_proj.bias": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.v_proj.bias": "model-00014-of-00014.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00014-of-00014.safetensors", "model.norm.weight": "model-00014-of-00014.safetensors"}}
|
tokenizer.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e20ddafc659ba90242154b55275402edeca0715e5dbb30f56815a4ce081f4893
|
3 |
+
size 11422778
|
tokenizer_config.json
ADDED
@@ -0,0 +1,195 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"add_prefix_space": null,
|
5 |
+
"added_tokens_decoder": {
|
6 |
+
"151643": {
|
7 |
+
"content": "<|end▁of▁sentence|>",
|
8 |
+
"lstrip": false,
|
9 |
+
"normalized": false,
|
10 |
+
"rstrip": false,
|
11 |
+
"single_word": false,
|
12 |
+
"special": true
|
13 |
+
},
|
14 |
+
"151644": {
|
15 |
+
"content": "<|User|>",
|
16 |
+
"lstrip": false,
|
17 |
+
"normalized": false,
|
18 |
+
"rstrip": false,
|
19 |
+
"single_word": false,
|
20 |
+
"special": false
|
21 |
+
},
|
22 |
+
"151645": {
|
23 |
+
"content": "<|Assistant|>",
|
24 |
+
"lstrip": false,
|
25 |
+
"normalized": false,
|
26 |
+
"rstrip": false,
|
27 |
+
"single_word": false,
|
28 |
+
"special": false
|
29 |
+
},
|
30 |
+
"151646": {
|
31 |
+
"content": "<|begin▁of▁sentence|>",
|
32 |
+
"lstrip": false,
|
33 |
+
"normalized": false,
|
34 |
+
"rstrip": false,
|
35 |
+
"single_word": false,
|
36 |
+
"special": true
|
37 |
+
},
|
38 |
+
"151647": {
|
39 |
+
"content": "<|EOT|>",
|
40 |
+
"lstrip": false,
|
41 |
+
"normalized": false,
|
42 |
+
"rstrip": false,
|
43 |
+
"single_word": false,
|
44 |
+
"special": false
|
45 |
+
},
|
46 |
+
"151648": {
|
47 |
+
"content": "<think>",
|
48 |
+
"lstrip": false,
|
49 |
+
"normalized": false,
|
50 |
+
"rstrip": false,
|
51 |
+
"single_word": false,
|
52 |
+
"special": false
|
53 |
+
},
|
54 |
+
"151649": {
|
55 |
+
"content": "</think>",
|
56 |
+
"lstrip": false,
|
57 |
+
"normalized": false,
|
58 |
+
"rstrip": false,
|
59 |
+
"single_word": false,
|
60 |
+
"special": false
|
61 |
+
},
|
62 |
+
"151650": {
|
63 |
+
"content": "<|quad_start|>",
|
64 |
+
"lstrip": false,
|
65 |
+
"normalized": false,
|
66 |
+
"rstrip": false,
|
67 |
+
"single_word": false,
|
68 |
+
"special": true
|
69 |
+
},
|
70 |
+
"151651": {
|
71 |
+
"content": "<|quad_end|>",
|
72 |
+
"lstrip": false,
|
73 |
+
"normalized": false,
|
74 |
+
"rstrip": false,
|
75 |
+
"single_word": false,
|
76 |
+
"special": true
|
77 |
+
},
|
78 |
+
"151652": {
|
79 |
+
"content": "<|vision_start|>",
|
80 |
+
"lstrip": false,
|
81 |
+
"normalized": false,
|
82 |
+
"rstrip": false,
|
83 |
+
"single_word": false,
|
84 |
+
"special": true
|
85 |
+
},
|
86 |
+
"151653": {
|
87 |
+
"content": "<|vision_end|>",
|
88 |
+
"lstrip": false,
|
89 |
+
"normalized": false,
|
90 |
+
"rstrip": false,
|
91 |
+
"single_word": false,
|
92 |
+
"special": true
|
93 |
+
},
|
94 |
+
"151654": {
|
95 |
+
"content": "<|vision_pad|>",
|
96 |
+
"lstrip": false,
|
97 |
+
"normalized": false,
|
98 |
+
"rstrip": false,
|
99 |
+
"single_word": false,
|
100 |
+
"special": true
|
101 |
+
},
|
102 |
+
"151655": {
|
103 |
+
"content": "<|image_pad|>",
|
104 |
+
"lstrip": false,
|
105 |
+
"normalized": false,
|
106 |
+
"rstrip": false,
|
107 |
+
"single_word": false,
|
108 |
+
"special": true
|
109 |
+
},
|
110 |
+
"151656": {
|
111 |
+
"content": "<|video_pad|>",
|
112 |
+
"lstrip": false,
|
113 |
+
"normalized": false,
|
114 |
+
"rstrip": false,
|
115 |
+
"single_word": false,
|
116 |
+
"special": true
|
117 |
+
},
|
118 |
+
"151657": {
|
119 |
+
"content": "<tool_call>",
|
120 |
+
"lstrip": false,
|
121 |
+
"normalized": false,
|
122 |
+
"rstrip": false,
|
123 |
+
"single_word": false,
|
124 |
+
"special": false
|
125 |
+
},
|
126 |
+
"151658": {
|
127 |
+
"content": "</tool_call>",
|
128 |
+
"lstrip": false,
|
129 |
+
"normalized": false,
|
130 |
+
"rstrip": false,
|
131 |
+
"single_word": false,
|
132 |
+
"special": false
|
133 |
+
},
|
134 |
+
"151659": {
|
135 |
+
"content": "<|fim_prefix|>",
|
136 |
+
"lstrip": false,
|
137 |
+
"normalized": false,
|
138 |
+
"rstrip": false,
|
139 |
+
"single_word": false,
|
140 |
+
"special": false
|
141 |
+
},
|
142 |
+
"151660": {
|
143 |
+
"content": "<|fim_middle|>",
|
144 |
+
"lstrip": false,
|
145 |
+
"normalized": false,
|
146 |
+
"rstrip": false,
|
147 |
+
"single_word": false,
|
148 |
+
"special": false
|
149 |
+
},
|
150 |
+
"151661": {
|
151 |
+
"content": "<|fim_suffix|>",
|
152 |
+
"lstrip": false,
|
153 |
+
"normalized": false,
|
154 |
+
"rstrip": false,
|
155 |
+
"single_word": false,
|
156 |
+
"special": false
|
157 |
+
},
|
158 |
+
"151662": {
|
159 |
+
"content": "<|fim_pad|>",
|
160 |
+
"lstrip": false,
|
161 |
+
"normalized": false,
|
162 |
+
"rstrip": false,
|
163 |
+
"single_word": false,
|
164 |
+
"special": false
|
165 |
+
},
|
166 |
+
"151663": {
|
167 |
+
"content": "<|repo_name|>",
|
168 |
+
"lstrip": false,
|
169 |
+
"normalized": false,
|
170 |
+
"rstrip": false,
|
171 |
+
"single_word": false,
|
172 |
+
"special": false
|
173 |
+
},
|
174 |
+
"151664": {
|
175 |
+
"content": "<|file_sep|>",
|
176 |
+
"lstrip": false,
|
177 |
+
"normalized": false,
|
178 |
+
"rstrip": false,
|
179 |
+
"single_word": false,
|
180 |
+
"special": false
|
181 |
+
}
|
182 |
+
},
|
183 |
+
"bos_token": "<|begin▁of▁sentence|>",
|
184 |
+
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin��>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}",
|
185 |
+
"clean_up_tokenization_spaces": false,
|
186 |
+
"eos_token": "<|end▁of▁sentence|>",
|
187 |
+
"extra_special_tokens": {},
|
188 |
+
"legacy": true,
|
189 |
+
"model_max_length": 16384,
|
190 |
+
"pad_token": "<|end▁of▁sentence|>",
|
191 |
+
"sp_model_kwargs": {},
|
192 |
+
"tokenizer_class": "LlamaTokenizer",
|
193 |
+
"unk_token": null,
|
194 |
+
"use_default_system_prompt": false
|
195 |
+
}
|