Spaces:
Runtime error
Runtime error
File size: 5,969 Bytes
8f8a944 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# Prompt Template
The prompt template of XTuner ensures consistency with the LLMs' official templates. Below, we will elaborate on its logic using the example of InternLM-Chat model (`internlm_chat`).
## Structure
```python
internlm_chat=dict(
SYSTEM='<|System|>:{system}\n',
INSTRUCTION='<|User|>:{input}<eoh>\n<|Bot|>:',
SUFFIX='<eoa>',
SUFFIX_AS_EOS=True,
SEP='\n',
STOP_WORDS=['<eoa>'])
```
- `SYSTEM`: The template for the "system" field during Q&A, where `{system}` represents the "system" text. It's worth noting that this field only appears once in multi-turn dialogues, specifically in the first turn.
- `INSTRUCTION`: The template for the "instruction" field during Q&A, where `{input}` represents the user instruction text.
- `SUFFIX`: The suffix for the "instruction" field, which will be appended to the "response" of each Q&A turn. Typically, this also serves as a special ending symbol (*i.e.*, `eos`). Defaults to `''`.
- `SUFFIX_AS_EOS`: Represents whether the aforementioned suffix acts as an ending symbol. If set to `True`, it will replace the `eos_token` of the `tokenizer`. Otherwise, the `eos_token` of the `tokenizer` will still be used to denote the end of sequence. Defaults to `False`.
- `SEP`: Used to separate multi-turn dialogues, it will be appended after the `INSTRUCTION` and `SUFFIX`. Defaults to `''`.
- `STOP_WORDS`: Used to specify the stop words, this information will be utilized during the text generation stage. It's worth noting that the `eos_token` of the `tokenizer` is automatically added to `STOP_WORDS`, without the need for manual setting.
## Results
**Single-turn**
```
<|System|>:{system}
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
```
**Multi-turn**
```
<|System|>:{system}
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
<|User|>:{input}<eoh>
<|Bot|>:{output}<eoa>
```
## Choosing the prompt template
| Model | Prompt Template |
| ---------------------------------------- | --------------- |
| baichuan-inc/Baichuan-7B | default\* |
| baichuan-inc/Baichuan-13B-Base | default\* |
| baichuan-inc/Baichuan-13B-Chat | baichuan_chat |
| baichuan-inc/Baichuan2-7B-Base | default\* |
| baichuan-inc/Baichuan2-7B-Chat | baichuan2_chat |
| baichuan-inc/Baichuan2-13B-Base | default\* |
| baichuan-inc/Baichuan2-13B-Chat | baichuan2_chat |
| THUDM/chatglm2-6b | chatglm2 |
| THUDM/chatglm3-6b | chatglm3 |
| THUDM/chatglm3-6b-base | chatglm3 |
| deepseek-ai/deepseek-coder-6.7b-base | deepseek_coder |
| deepseek-ai/deepseek-coder-6.7b-instruct | deepseek_coder |
| internlm/internlm-7b | default\* |
| internlm/internlm-20b | default\* |
| internlm/internlm-chat-7b | internlm_chat |
| internlm/internlm-chat-20b | internlm_chat |
| huggyllama/llama-7b | default |
| meta-llama/Llama-2-7b-hf | llama2_chat |
| meta-llama/Llama-2-7b-chat-hf | llama2_chat |
| meta-llama/Llama-2-70b-hf | llama2_chat |
| lmsys/vicuna-7b-v1.5 | vicuna |
| lmsys/vicuna-13b-v1.5 | vicuna |
| mistralai/Mistral-7B-v0.1 | mistral |
| mistralai/Mixtral-8x7B-v0.1 | mixtral |
| mistralai/Mixtral-8x7B-Instruct-v0.1 | mixtral |
| Qwen/Qwen-1_8B | default\* |
| Qwen/Qwen-1_8B-Chat | qwen_chat |
| Qwen/Qwen-7B | default\* |
| Qwen/Qwen-7B-Chat | qwen_chat |
| Qwen/Qwen-72B | default\* |
| Qwen/Qwen-72B-Chat | qwen_chat |
| bigcode/starcoder | default |
| 01-ai/Yi-6B | default |
| 01-ai/Yi-34B | default |
| HuggingFaceH4/zephyr-7b-beta | zephyr |
| deepseek-ai/deepseek-moe-16b-base | deepseek_moe |
| deepseek-ai/deepseek-moe-16b-chat | deepseek_moe |
| internlm/internlm2-1_8b | default\* |
| internlm/internlm2-7b | default\* |
| internlm/internlm2-20b | default\* |
| internlm/internlm2-chat-1_8b | internlm2_chat |
| internlm/internlm2-chat-7b | internlm2_chat |
| internlm/internlm2-chat-20b | internlm2_chat |
| Qwen/Qwen1.5-0.5B | default\* |
| Qwen/Qwen1.5-0.5B-Chat | qwen_chat |
| Qwen/Qwen1.5-1.8B | default\* |
| Qwen/Qwen1.5-1.8B-Chat | qwen_chat |
| Qwen/Qwen1.5-4B | default\* |
| Qwen/Qwen1.5-4B-Chat | qwen_chat |
| Qwen/Qwen1.5-7B | default\* |
| Qwen/Qwen1.5-7B-Chat | qwen_chat |
| Qwen/Qwen1.5-14B | default\* |
| Qwen/Qwen1.5-14B-Chat | qwen_chat |
| Qwen/Qwen1.5-72B | default\* |
| Qwen/Qwen1.5-72B-Chat | qwen_chat |
| google/gemma-2b | default\* |
| google/gemma-2b-it | gemma\* |
| google/gemma-7b | default\* |
| google/gemma-7b-it | gemma\* |
\*: The official template has special tokens (like `<|im_start|>`, `<|im_end|>`) that were not trained during the pre-training phase. Therefore, these models utilize the `default` template.
|