When will you fix the model replies missing</think>\n start tags

#19
by xldistance - opened

open-webui can't collapse the thought process, it's too tiring to stare at the thought process

xldistance changed discussion title from When will you fix the model replies missing \<think> \n start tags to When will you fix the model replies missing</think>\n start tags

I think the team actually want us to manually add a "<think>\n" after whole prompt. Not sure how to implement in open-webui

Just don't use open-webui, or wait for an update.

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

You can just remove the <think> tag in the chat template inside tokenizer_config.json, so the model will generate <think> at the beginning of the output and open webui will parse response correctly.
The drawback is that the model may skip the reasoning section for some questions, but I feel the overall experience is fine.

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

This solved the issue, thanks :)

Think tags come up just fine with llama.cpp. Example usage: .\llama-cli --model QwQ-32B-Q8_0.gguf --temp 0.0 --color --threads 36 --ctx-size 128000

ollama too, no conf needed, no need to modify any prompt temaplate, it works out of the box. Works perfectly coupled to openwebui, Continue etc.

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

it works for me on 0.7.3

vllm serve <model_id> --dtype half --quantization awq --enable-reasoning --reasoning-parser deepseek_r1 --max-model-len 32768 (adjust len based on your available memory)

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

https://unsloth.ai/blog/qwq-32b RELEVANT for llama.cpp users

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

Can you give me the code to modify open-webui?

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

I didn't find any loss in accuracy
微信图片_20250310131519.jpg

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

我也用vllm推理,实际上这个和vllm无关,修改tokenizer_config.json文件中的chat_template字段,将该字段中的<|im_start|>assistant\n <think>\n 的 <think> \n 去掉然后加上--enable-reasoning --reasoning-parser deepseek_r1就可以。

更新到官方最新的tokenizer.json,然后使用以下tokenizer_config.json
"""
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- '' }}\n {%- endif %}\n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}\n {%- for tool in tools %}\n {{- "\n" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- "\n\n\nFor each function call, return a json object with function name and arguments within XML tags:\n\n{\"name\": , \"arguments\": }\n<|im_end|>\n" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" and not message.tool_calls %}\n {%- set content = message.content %}\n {%- if not loop.last %}\n {%- set content = message.content.split('')[-1].lstrip('\n ') %}\n {%- endif %}\n {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" %}\n {%- set content = message.content %}\n {%- if not loop.last %}\n {%- set content = message.content.split('')[-1].lstrip('\n') %}\n {%- endif %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\n' + content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\n\n{"name": "' }}\n {{- tool_call.name }}\n {{- '", "arguments": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\n' }}\n {%- endfor %}\n {{- '<|im_end|>\n' }}\n {%- elif message.role == "tool" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\n\n' }}\n {{- message.content }}\n {{- '\n' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}\n {{- '<|im_end|>\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

"""

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

I've experienced similar problems, adding these flags makes the model more stupid, generating mixed Chinese and English outputs. How do these flags work exactly?

xldistance changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment