cgus commited on
Commit
c38b055
·
verified ·
1 Parent(s): ac99b63

Upload 7 files

Browse files
README.md ADDED
@@ -0,0 +1,311 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: false
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - language
8
+ - granite-3.2
9
+ base_model:
10
+ - ibm-granite/granite-3.1-8b-instruct
11
+ ---
12
+
13
+ # Granite-3.2-8B-Instruct-Preview
14
+
15
+ **Model Summary:**
16
+ Granite-3.2-8B-Instruct-Preview is an early release of an 8B long-context model fine-tuned for enhanced reasoning (thinking) capabilities. Built on top of [Granite-3.1-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct), it has been trained using a mix of permissively licensed open-source datasets and internally generated synthetic data designed for reasoning tasks. The model allows controllability of its thinking capability, ensuring it is applied only when required.
17
+
18
+ <!-- is preview release of a finetuned mdpeis a 8B parameter long-context instruct model finetuned from Granite-3.1-8B-Instruct using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is finetuned to reason
19
+
20
+ developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. -->
21
+
22
+ - **Developers:** Granite Team, IBM
23
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
24
+ - **Release Date**: February 7th, 2025
25
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
26
+
27
+ **Supported Languages:**
28
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may finetune this Granite model for languages beyond these 12 languages.
29
+
30
+ **Intended Use:**
31
+ The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications.
32
+
33
+ **Capabilities**
34
+ * **Thinking**
35
+ * Summarization
36
+ * Text classification
37
+ * Text extraction
38
+ * Question-answering
39
+ * Retrieval Augmented Generation (RAG)
40
+ * Code related tasks
41
+ * Function-calling tasks
42
+ * Multilingual dialog use cases
43
+ * Long-context tasks including long document/meeting summarization, long document QA, etc.
44
+
45
+ **Generation:**
46
+ This is a simple example of how to use Granite-3.2-8B-Instruct-Preview model.
47
+
48
+ Install the following libraries:
49
+
50
+ ```shell
51
+ pip install torch torchvision torchaudio
52
+ pip install accelerate
53
+ pip install transformers
54
+ ```
55
+ Then, copy the snippet from the section that is relevant for your use case.
56
+
57
+ ```python
58
+ from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
59
+ import torch
60
+
61
+ model_path="ibm-granite/granite-3.2-8b-instruct-preview"
62
+ device="cuda"
63
+ model = AutoModelForCausalLM.from_pretrained(
64
+ model_path,
65
+ device_map=device,
66
+ torch_dtype=torch.bfloat16,
67
+ )
68
+ tokenizer = AutoTokenizer.from_pretrained(
69
+ model_path
70
+ )
71
+
72
+ conv = [{"role": "user", "content":"How do I develop a skill?"}]
73
+
74
+ input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device)
75
+
76
+ set_seed(42)
77
+ output = model.generate(
78
+ **input_ids,
79
+ max_new_tokens=8192,
80
+ )
81
+
82
+ prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
83
+ print(prediction)
84
+ ```
85
+
86
+ **Example Outputs**
87
+ - thinking=True
88
+ ```md
89
+ Here is my thought process:
90
+
91
+ 1. Understand the user's intent: The user is seeking guidance on how to develop a new skill.
92
+ 2. Break down the process: Skills development involves several stages - identification, planning, practice, feedback, and refinement.
93
+ 3. Consider different aspects: Discuss various methods, mindset, and resources needed for effective skill acquisition.
94
+ 4. Provide actionable steps: Offer a structured approach that the user can follow.
95
+
96
+ Here is my response:
97
+
98
+ Developing a new skill is a systematic process that involves several key steps and a dedicated mindset. Here's a comprehensive guide to help you effectively cultivate a new skill:
99
+
100
+ ### 1. **Identify the Skill**
101
+ - **Clarity of Purpose**: Clearly define what skill you want to acquire. Be specific about why you want this skill and how it aligns with your goals or interests.
102
+ - **Research**: Understand the basics of the skill. Read about it, watch tutorials, or talk to people who are already proficient. This initial research will give you a realistic understanding of what the skill entails.
103
+
104
+ ### 2. **Set Clear Goals**
105
+ - **SMART Goals**: Make sure your goals are Specific, Measurable, Achievable, Relevant, and Time-bound. For example, instead of saying "I want to learn programming," set a goal like "I will complete a beginner's Python course within three months."
106
+ - **Break Down**: If the skill is complex, break it down into smaller, manageable milestones.
107
+
108
+ ### 3. **Create a Learning Plan**
109
+ - **Resources**: Identify the best resources for learning. These could be online courses, books, mentors, workshops, or practical projects.
110
+ - **Schedule**: Dedicate time in your routine for skill practice. Consistency is crucial in skill development. Treat your learning time as non-negotiable.
111
+
112
+ ### 4. **Start with the Basics**
113
+ - **Foundational Knowledge**: Begin by grasping the fundamentals. Understanding the basics thoroughly sets a strong foundation for more advanced learning.
114
+ - **Practice Regularly**: Regular, focused practice is essential. Even 15-30 minutes a day can yield significant results over time.
115
+
116
+ ### 5. **Seek Feedback and Guidance**
117
+ - **Mentorship**: If possible, find a mentor or join a community of learners. Feedback from experienced individuals can accelerate your learning curve.
118
+ - **Peer Learning**: Engage with others learning the same skill. Peer groups provide support, motivation, and different perspectives.
119
+
120
+ ### 6. **Apply the Skill in Real-World Scenarios**
121
+ - **Practical Application**: Look for opportunities to apply what you've learned. This could be through projects, internships, or volunteering. Practical application solidifies your understanding and builds confidence.
122
+ - **Iterate Based on Feedback**: After applying your skill, seek and incorporate feedback to refine and improve.
123
+
124
+ ### 7. **Maintain a Growth Mindset**
125
+ - **Embrace Challenges**: Understand that setbacks and failures are part of the learning process. Maintain a positive attitude and view them as opportunities to learn.
126
+ - **Patience and Persistence**: Skill development takes time. Stay patient and persistent, even when progress seems slow.
127
+
128
+ ### 8. **Track and Reflect**
129
+ - **Progress Tracking**: Regularly review your progress. This could be through keeping a learning journal or using apps designed for skill tracking.
130
+ - **Reflect and Adjust**: Periodically reflect on your learning process. What's working? What's not? Adjust your plan accordingly.
131
+
132
+ ### 9. **Stay Motivated**
133
+ - **Reward System**: Set small rewards for achieving milestones to keep yourself motivated.
134
+ - **Stay Curious**: Keep your interest piqued by exploring related topics or advanced aspects of the skill.
135
+
136
+ By following these steps and maintaining a disciplined, growth-oriented approach, you can effectively develop any new skill. Remember, the key is consistent effort and a willingness to learn from both successes and failures.
137
+
138
+ Building skills is an ongoing journey, not a destination.
139
+ ```
140
+ - thinking=False
141
+ ```md
142
+ Developing a skill involves a combination of learning, practice, and often, feedback. Here's a step-by-step guide to help you develop a new skill:
143
+
144
+ 1. **Identify the Skill**: Start by clearly defining what skill you want to develop. Be specific. Instead of saying "I want to learn to code," specify a programming language like Python or JavaScript.
145
+
146
+ 2. **Research**: Learn about the basics of the skill. Read books, articles, watch tutorials, or take online courses. Websites like Coursera, Udemy, Khan Academy, and YouTube can be great resources.
147
+
148
+ 3. **Set Clear Goals**: Break down your skill into smaller, manageable goals. For example, if you're learning a new language, your goals might be to learn basic grammar, build a simple sentence, have a basic conversation, etc.
149
+
150
+ 4. **Create a Study Plan**: Allocate specific time each day or week for learning and practicing. Consistency is key in skill development.
151
+
152
+ 5. **Practice**: Apply what you've learned. Practice makes permanent. If you're learning to code, write small programs. If it's a musical instrument, play regularly.
153
+
154
+ 6. **Get Feedback**: Seek feedback from others who are more experienced. This could be a mentor, a tutor, or even online communities. Constructive criticism can help you identify areas for improvement.
155
+
156
+ 7. **Review and Refine**: Regularly review what you've learned. Refine your skills based on feedback and your own observations.
157
+
158
+ 8. **Apply in Real Life**: Try to use your new skill in real-life situations. This could be a project at work, a personal hobby, or volunteering.
159
+
160
+ 9. **Be Patient and Persistent**: Skill development takes time. Don't get discouraged by slow progress or setbacks. Keep practicing and learning.
161
+
162
+ 10. **Stay Motivated**: Keep your end goal in mind and celebrate small victories along the way to stay motivated.
163
+
164
+ Remember, everyone learns at their own pace, so don't compare your progress with others. The most important thing is that you're consistently moving forward.
165
+ ```
166
+
167
+ **Evaluation Results:**
168
+ <table>
169
+
170
+ <thead>
171
+ <tr>
172
+ <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
173
+ <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
174
+ <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
175
+ <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
176
+ <th style="text-align:center; background-color: #001d6c; color: white;">PopQA</th>
177
+ <th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
178
+ <th style="text-align:center; background-color: #001d6c; color: white;">BigBenchHard</th>
179
+ <th style="text-align:center; background-color: #001d6c; color: white;">DROP</th>
180
+ <th style="text-align:center; background-color: #001d6c; color: white;">GSM8K</th>
181
+ <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
182
+ <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
183
+ <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
184
+ <th style="text-align:center; background-color: #001d6c; color: white;">AttaQ</th>
185
+ </tr></thead>
186
+ <tbody>
187
+ <tr>
188
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Llama-3.1-8B-Instruct</td>
189
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">36.43</td>
190
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">27.22</td>
191
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">69.15</td>
192
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">28.79</td>
193
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">52.79</td>
194
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">72.66</td>
195
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">61.48</td>
196
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">83.24</td>
197
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">85.32</td>
198
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
199
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
200
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
201
+ </tr>
202
+
203
+ <tr>
204
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">DeepSeek-R1-Distill-Llama-8B</td>
205
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">17.17</td>
206
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">21.85</td>
207
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">45.80</td>
208
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">13.25</td>
209
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">47.43</td>
210
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">65.71</td>
211
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">44.46</td>
212
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">72.18</td>
213
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">67.54</td>
214
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">62.91</td>
215
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">66.50</td>
216
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">42.87</td>
217
+ </tr>
218
+
219
+ <tr>
220
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Qwen-2.5-7B-Instruct</td>
221
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">25.44</td>
222
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
223
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">74.30</td>
224
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">18.12</td>
225
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">63.06</td>
226
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">70.40</td>
227
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">54.71</td>
228
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">84.46</td>
229
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">93.35</td>
230
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">89.91</td>
231
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">74.90</td>
232
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">81.90</td>
233
+ </tr>
234
+
235
+ <tr>
236
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">DeepSeek-R1-Distill-Qwen-7B</td>
237
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">10.36</td>
238
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">15.35</td>
239
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">50.72</td>
240
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">9.94</td>
241
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">47.14</td>
242
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">65.04</td>
243
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">42.76</td>
244
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">78.47</td>
245
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">79.89</td>
246
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">78.43</td>
247
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
248
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
249
+ </tr>
250
+
251
+ <tr>
252
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
253
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
254
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">27.87</td>
255
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">66.84</td>
256
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">28.84</td>
257
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">65.92</td>
258
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">68.10</td>
259
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">50.78</td>
260
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">79.08</td>
261
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">88.82</td>
262
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">84.62</td>
263
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">71.20</td>
264
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
265
+ </tr>
266
+
267
+ <tr>
268
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct-Preview</td>
269
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">55.23</td>
270
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">61.16</td>
271
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">66.93</td>
272
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">28.08</td>
273
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">66.37</td>
274
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">65.60</td>
275
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">50.73</td>
276
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">83.09</td>
277
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">89.47</td>
278
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">86.88</td>
279
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">73.57</td>
280
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">85.99</td>
281
+ </tr>
282
+
283
+ </tbody></table>
284
+
285
+ **Training Data:**
286
+ Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
287
+ <!-- A detailed attribution of datasets can be found in [Granite 3.2 Technical Report (coming soon)](#), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->
288
+
289
+ **Infrastructure:**
290
+ We train Granite-3.2-8B-Instruct-Preview using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
291
+
292
+ **Ethical Considerations and Limitations:**
293
+ Granite-3.2-8B-Instruct-Preview builds upon Granite-3.1-8B-Instruct, leveraging both permissively licensed open-source and select proprietary data for enhanced performance. Since it inherits its foundation from the previous model, all ethical considerations and limitations applicable to [Granite-3.1-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) remain relevant.
294
+
295
+
296
+ **Resources**
297
+ - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
298
+ - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
299
+ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
300
+
301
+ <!-- ## Citation
302
+ ```
303
+ @misc{granite-models,
304
+ author = {author 1, author2, ...},
305
+ title = {},
306
+ journal = {},
307
+ volume = {},
308
+ year = {2024},
309
+ url = {https://arxiv.org/abs/0000.00000},
310
+ }
311
+ ``` -->
config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "ibm-granite/granite-3.2-8b-instruct-preview",
3
+ "architectures": [
4
+ "GraniteForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attention_multiplier": 0.0078125,
9
+ "bos_token_id": 0,
10
+ "embedding_multiplier": 12.0,
11
+ "eos_token_id": 0,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 4096,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 12800,
16
+ "logits_scaling": 16.0,
17
+ "max_position_embeddings": 131072,
18
+ "mlp_bias": false,
19
+ "model_type": "granite",
20
+ "num_attention_heads": 32,
21
+ "num_hidden_layers": 40,
22
+ "num_key_value_heads": 8,
23
+ "pad_token_id": 0,
24
+ "residual_multiplier": 0.22,
25
+ "rms_norm_eps": 1e-05,
26
+ "rope_scaling": null,
27
+ "rope_theta": 10000000.0,
28
+ "tie_word_embeddings": true,
29
+ "torch_dtype": "bfloat16",
30
+ "transformers_version": "4.48.2",
31
+ "use_cache": true,
32
+ "vocab_size": 49155,
33
+ "quantization_config": {
34
+ "quant_method": "exl2",
35
+ "version": "0.2.8",
36
+ "bits": 4.0,
37
+ "head_bits": 6,
38
+ "calibration": {
39
+ "rows": 115,
40
+ "length": 2048,
41
+ "dataset": "(default)"
42
+ }
43
+ }
44
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.48.2"
7
+ }
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a48be9f0d896b8209d244d7165b69ac243755e7496dbce21c1aad668a89b9fc3
3
+ size 4547863522
special_tokens_map.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|start_of_role|>",
4
+ "<|end_of_role|>",
5
+ "<|tool_call|>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<|end_of_text|>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "<|end_of_text|>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": {
22
+ "content": "<|end_of_text|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "unk_token": {
29
+ "content": "<|end_of_text|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ }
35
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<|end_of_text|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<fim_prefix>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "<fim_middle>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "<fim_suffix>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "4": {
38
+ "content": "<fim_pad>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "5": {
46
+ "content": "<filename>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "6": {
54
+ "content": "<gh_stars>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "7": {
62
+ "content": "<issue_start>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "8": {
70
+ "content": "<issue_comment>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "9": {
78
+ "content": "<issue_closed>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "10": {
86
+ "content": "<jupyter_start>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "11": {
94
+ "content": "<jupyter_text>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "12": {
102
+ "content": "<jupyter_code>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "13": {
110
+ "content": "<jupyter_output>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "14": {
118
+ "content": "<empty_output>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": true
124
+ },
125
+ "15": {
126
+ "content": "<commit_before>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": true
132
+ },
133
+ "16": {
134
+ "content": "<commit_msg>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": true
140
+ },
141
+ "17": {
142
+ "content": "<commit_after>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": true
148
+ },
149
+ "18": {
150
+ "content": "<reponame>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": true
156
+ },
157
+ "49152": {
158
+ "content": "<|start_of_role|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": true
164
+ },
165
+ "49153": {
166
+ "content": "<|end_of_role|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": true
172
+ },
173
+ "49154": {
174
+ "content": "<|tool_call|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": true
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|start_of_role|>",
184
+ "<|end_of_role|>",
185
+ "<|tool_call|>"
186
+ ],
187
+ "bos_token": "<|end_of_text|>",
188
+ "chat_template": "{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content'] %}\n {%- set loop_messages = messages[1:] %}\n{%- else %}\n {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n {%- if tools and documents %}\n {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\n\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif tools %}\n {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n {%- elif documents %}\n {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif thinking %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\nRespond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts after 'Here is my thought process:' and write your response after 'Here is my response:' for each user query.\" %}\n {%- else %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\" %} \n {%- endif %}\n {%- if 'citations' in controls and documents %}\n {%- set system_message = system_message + '\n\nIn your response, use the symbols <co> and </co> to indicate when a fact comes from a document in the search result, e.g <co>0</co> for a fact from document 0. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n {%- endif %}\n {%- if 'hallucinations' in controls and documents %}\n {%- set system_message = system_message + '\n\nFinally, after the response is written, include a numbered list of sentences from the response that are potentially hallucinated and not based in the documents.' %}\n {%- endif %}\n {%- set loop_messages = messages %}\n{%- endif %}\n{{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n{%- if tools %}\n {{- '<|start_of_role|>tools<|end_of_role|>' }}\n {{- tools | tojson(indent=4) }}\n {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- if documents %}\n {{- '<|start_of_role|>documents<|end_of_role|>' }}\n {%- for document in documents %}\n {{- 'Document ' + loop.index0 | string + '\n' }}\n {{- document['text'] }}\n {%- if not loop.last %}\n {{- '\n\n'}}\n {%- endif%}\n {%- endfor %}\n {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- for message in loop_messages %}\n {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n {%- if loop.last and add_generation_prompt %}\n {{- '<|start_of_role|>assistant' }}\n {%- if controls %}\n {{- ' ' + controls | tojson()}}\n {%- endif %}\n {{- '<|end_of_role|>' }}\n {%- endif %}\n{%- endfor %}",
189
+ "clean_up_tokenization_spaces": true,
190
+ "eos_token": "<|end_of_text|>",
191
+ "errors": "replace",
192
+ "extra_special_tokens": {},
193
+ "model_max_length": 9223372036854775807,
194
+ "pad_token": "<|end_of_text|>",
195
+ "padding_side": "left",
196
+ "tokenizer_class": "GPT2Tokenizer",
197
+ "unk_token": "<|end_of_text|>",
198
+ "vocab_size": 49152
199
+ }