File size: 5,969 Bytes
8f8a944
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# Prompt Template

The prompt template of XTuner ensures consistency with the LLMs' official templates. Below, we will elaborate on its logic using the example of InternLM-Chat model (`internlm_chat`).

## Structure

```python
internlm_chat=dict(
    SYSTEM='<|System|>:{system}\n',
    INSTRUCTION='<|User|>:{input}<eoh>\n<|Bot|>:',
    SUFFIX='<eoa>',
    SUFFIX_AS_EOS=True,
    SEP='\n',
    STOP_WORDS=['<eoa>'])
```

- `SYSTEM`: The template for the "system" field during Q&A, where `{system}` represents the "system" text. It's worth noting that this field only appears once in multi-turn dialogues, specifically in the first turn.

- `INSTRUCTION`: The template for the "instruction" field during Q&A, where `{input}` represents the user instruction text.

- `SUFFIX`: The suffix for the "instruction" field, which will be appended to the "response" of each Q&A turn. Typically, this also serves as a special ending symbol (*i.e.*, `eos`). Defaults to `''`.

- `SUFFIX_AS_EOS`: Represents whether the aforementioned suffix acts as an ending symbol. If set to `True`, it will replace the `eos_token` of the `tokenizer`. Otherwise, the `eos_token` of the `tokenizer` will still be used to denote the end of sequence. Defaults to `False`.

- `SEP`: Used to separate multi-turn dialogues, it will be appended after the `INSTRUCTION` and `SUFFIX`. Defaults to `''`.

- `STOP_WORDS`: Used to specify the stop words, this information will be utilized during the text generation stage.  It's worth noting that the `eos_token` of the `tokenizer` is automatically added to `STOP_WORDS`, without the need for manual setting.

## Results

**Single-turn**

```
<|System|>:{system}
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
```

**Multi-turn**

```
<|System|>:{system}
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
```

## Choosing the prompt template

| Model                                    | Prompt Template |
| ---------------------------------------- | --------------- |
| baichuan-inc/Baichuan-7B                 | default\*       |
| baichuan-inc/Baichuan-13B-Base           | default\*       |
| baichuan-inc/Baichuan-13B-Chat           | baichuan_chat   |
| baichuan-inc/Baichuan2-7B-Base           | default\*       |
| baichuan-inc/Baichuan2-7B-Chat           | baichuan2_chat  |
| baichuan-inc/Baichuan2-13B-Base          | default\*       |
| baichuan-inc/Baichuan2-13B-Chat          | baichuan2_chat  |
| THUDM/chatglm2-6b                        | chatglm2        |
| THUDM/chatglm3-6b                        | chatglm3        |
| THUDM/chatglm3-6b-base                   | chatglm3        |
| deepseek-ai/deepseek-coder-6.7b-base     | deepseek_coder  |
| deepseek-ai/deepseek-coder-6.7b-instruct | deepseek_coder  |
| internlm/internlm-7b                     | default\*       |
| internlm/internlm-20b                    | default\*       |
| internlm/internlm-chat-7b                | internlm_chat   |
| internlm/internlm-chat-20b               | internlm_chat   |
| huggyllama/llama-7b                      | default         |
| meta-llama/Llama-2-7b-hf                 | llama2_chat     |
| meta-llama/Llama-2-7b-chat-hf            | llama2_chat     |
| meta-llama/Llama-2-70b-hf                | llama2_chat     |
| lmsys/vicuna-7b-v1.5                     | vicuna          |
| lmsys/vicuna-13b-v1.5                    | vicuna          |
| mistralai/Mistral-7B-v0.1                | mistral         |
| mistralai/Mixtral-8x7B-v0.1              | mixtral         |
| mistralai/Mixtral-8x7B-Instruct-v0.1     | mixtral         |
| Qwen/Qwen-1_8B                           | default\*       |
| Qwen/Qwen-1_8B-Chat                      | qwen_chat       |
| Qwen/Qwen-7B                             | default\*       |
| Qwen/Qwen-7B-Chat                        | qwen_chat       |
| Qwen/Qwen-72B                            | default\*       |
| Qwen/Qwen-72B-Chat                       | qwen_chat       |
| bigcode/starcoder                        | default         |
| 01-ai/Yi-6B                              | default         |
| 01-ai/Yi-34B                             | default         |
| HuggingFaceH4/zephyr-7b-beta             | zephyr          |
| deepseek-ai/deepseek-moe-16b-base        | deepseek_moe    |
| deepseek-ai/deepseek-moe-16b-chat        | deepseek_moe    |
| internlm/internlm2-1_8b                  | default\*       |
| internlm/internlm2-7b                    | default\*       |
| internlm/internlm2-20b                   | default\*       |
| internlm/internlm2-chat-1_8b             | internlm2_chat  |
| internlm/internlm2-chat-7b               | internlm2_chat  |
| internlm/internlm2-chat-20b              | internlm2_chat  |
| Qwen/Qwen1.5-0.5B                        | default\*       |
| Qwen/Qwen1.5-0.5B-Chat                   | qwen_chat       |
| Qwen/Qwen1.5-1.8B                        | default\*       |
| Qwen/Qwen1.5-1.8B-Chat                   | qwen_chat       |
| Qwen/Qwen1.5-4B                          | default\*       |
| Qwen/Qwen1.5-4B-Chat                     | qwen_chat       |
| Qwen/Qwen1.5-7B                          | default\*       |
| Qwen/Qwen1.5-7B-Chat                     | qwen_chat       |
| Qwen/Qwen1.5-14B                         | default\*       |
| Qwen/Qwen1.5-14B-Chat                    | qwen_chat       |
| Qwen/Qwen1.5-72B                         | default\*       |
| Qwen/Qwen1.5-72B-Chat                    | qwen_chat       |
| google/gemma-2b                          | default\*       |
| google/gemma-2b-it                       | gemma\*         |
| google/gemma-7b                          | default\*       |
| google/gemma-7b-it                       | gemma\*         |

\*: The official template has special tokens (like `<|im_start|>`, `<|im_end|>`) that were not trained during the pre-training phase. Therefore, these models utilize the `default` template.