Qwen/QwQ-32B · When will you fix the model replies missing</think>\n start tags

8 days ago

open-webui can't collapse the thought process, it's too tiring to stare at the thought process

xldistance changed discussion title from When will you fix the model replies missing \<think> \n start tags to When will you fix the model replies missing</think>\n start tags 8 days ago

CHNtentes

8 days ago

•

edited 8 days ago

I think the team actually want us to manually add a "<think>\n" after whole prompt. Not sure how to implement in open-webui

Krotos

8 days ago

Just don't use open-webui, or wait for an update.

dr-e

8 days ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Allenxy96

8 days ago

•

edited 8 days ago

You can just remove the <think> tag in the chat template inside tokenizer_config.json, so the model will generate <think> at the beginning of the output and open webui will parse response correctly.
The drawback is that the model may skip the reasoning section for some questions, but I feel the overall experience is fine.

venkilfc

8 days ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

This solved the issue, thanks :)

jeffwadsworth

8 days ago

Think tags come up just fine with llama.cpp. Example usage: .\llama-cli --model QwQ-32B-Q8_0.gguf --temp 0.0 --color --threads 36 --ctx-size 128000

owao

8 days ago

•

edited 8 days ago

ollama too, no conf needed, no need to modify any prompt temaplate, it works out of the box. Works perfectly coupled to openwebui, Continue etc.

bingw5

7 days ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

doctor013

7 days ago

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

dr-e

7 days ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

it works for me on 0.7.3

vllm serve <model_id> --dtype half --quantization awq --enable-reasoning --reasoning-parser deepseek_r1 --max-model-len 32768 (adjust len based on your available memory)

meetzuber

7 days ago

•

edited 7 days ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

owao

6 days ago

https://unsloth.ai/blog/qwq-32b RELEVANT for llama.cpp users

xldistance

5 days ago

I used the filter function in open-webui, and it adds tag in the response when the qwq32 model is checked

Can you give me the code to modify open-webui?

SongXiaoMao

4 days ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

I didn't find any loss in accuracy

ZadoBest

3 days ago

•

edited 3 days ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

Doesn't work for qwq-32b-awq model. I am using vllm 0.7.2-post1

我也用vllm推理，实际上这个和vllm无关，修改tokenizer_config.json文件中的chat_template字段，将该字段中的<|im_start|>assistant\n <think>\n 的 <think> \n 去掉然后加上--enable-reasoning --reasoning-parser deepseek_r1就可以。

xldistance

3 days ago

更新到官方最新的tokenizer.json,然后使用以下tokenizer_config.json
"""
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- '' }}\n {%- endif %}\n {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}\n {%- for tool in tools %}\n {{- "\n" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- "\n\n\nFor each function call, return a json object with function name and arguments within XML tags:\n\n{\"name\": , \"arguments\": }\n<|im_end|>\n" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" and not message.tool_calls %}\n {%- set content = message.content %}\n {%- if not loop.last %}\n {%- set content = message.content.split('')[-1].lstrip('\n ') %}\n {%- endif %}\n {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}\n {%- elif message.role == "assistant" %}\n {%- set content = message.content %}\n {%- if not loop.last %}\n {%- set content = message.content.split('')[-1].lstrip('\n') %}\n {%- endif %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\n' + content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\n\n{"name": "' }}\n {{- tool_call.name }}\n {{- '", "arguments": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\n' }}\n {%- endfor %}\n {{- '<|im_end|>\n' }}\n {%- elif message.role == "tool" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\n\n' }}\n {{- message.content }}\n {{- '\n' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}\n {{- '<|im_end|>\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

"""

HannahZhao

1 day ago

If you deploy using vLLM, adding the flags --enable-reasoning --reasoning-parser deepseek_r1 to the "vllm serve" command fixes the issue.

After adding these flags model accuracy decreased. without adding these flags model can answer "how many r's in word strawberrry?" correctly. but after adding these flags it is unable to answer. it gives 3 R's but correct answer is 4 as there is an extra R. I am using FP16 model.

we need some other solution.

I've experienced similar problems, adding these flags makes the model more stupid, generating mixed Chinese and English outputs. How do these flags work exactly?

xldistance changed discussion status to closed about 17 hours ago