QwQ updated their tokenizer, model update needed?

#4
by async0x42 - opened

QwQ had some changes applied to it, does this model need to be updated due to that? (https://huggingface.co/Qwen/QwQ-32B/commits/main)

The model actually uses the "regular" Qwen tokenizer and not QwQ's tokenizer--here's the mergekit config:

models:
  - model: trashpanda-org/Qwen2.5-32B-Marigold-v0-exp
    parameters:
      weight: 1
      density: 1
  - model: trashpanda-org/Qwen2.5-32B-Marigold-v0
    parameters:
      weight: 1
      density: 1
  - model: Qwen/QwQ-32B
    parameters:
      weight: 0.9
      density: 0.9
merge_method: ties
base_model: Qwen/Qwen2.5-32B
parameters:
  weight: 0.9
  density: 0.9
  normalize: true
  int8_mask: true
tokenizer_source: Qwen/Qwen2.5-32B-Instruct
dtype: bfloat16

The reason being that in previous merge configs I tried, using the QwQ tokenizer somehow made the resulting model really bad at generating the </think> token, so it'd end up dumping its reply in the thinking block. It might've been because QwQ adds <think> and </think> as special tokens in its tokenizer, but Marigold didn't do that, but I'm not sure.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment