QwQ updated their tokenizer, model update needed?
#4
by
async0x42
- opened
QwQ had some changes applied to it, does this model need to be updated due to that? (https://huggingface.co/Qwen/QwQ-32B/commits/main)
The model actually uses the "regular" Qwen tokenizer and not QwQ's tokenizer--here's the mergekit config:
models:
- model: trashpanda-org/Qwen2.5-32B-Marigold-v0-exp
parameters:
weight: 1
density: 1
- model: trashpanda-org/Qwen2.5-32B-Marigold-v0
parameters:
weight: 1
density: 1
- model: Qwen/QwQ-32B
parameters:
weight: 0.9
density: 0.9
merge_method: ties
base_model: Qwen/Qwen2.5-32B
parameters:
weight: 0.9
density: 0.9
normalize: true
int8_mask: true
tokenizer_source: Qwen/Qwen2.5-32B-Instruct
dtype: bfloat16
The reason being that in previous merge configs I tried, using the QwQ tokenizer somehow made the resulting model really bad at generating the </think>
token, so it'd end up dumping its reply in the thinking block. It might've been because QwQ adds <think>
and </think>
as special tokens in its tokenizer, but Marigold didn't do that, but I'm not sure.