Did the base 1M start from qwen2.5-7b?

by huu-ontocord - opened 6 days ago

Discussion

huu-ontocord

6 days ago

Did you extend the qwen 2.5-7b to 1M and then did instruct? Is there any relation to qwen 2.5 7b

huu-ontocord

6 days ago

Specifically in the paper:
Qwen2.5-1M series are developed based on Qwen2.5 models (Yang et al., 2025) and support context
length up to 1M tokens.

huu-ontocord

6 days ago

And this:
The first two stages are similar to those of other Qwen2.5 models, where we directly use an intermediate
version from Qwen2.5 Base models for subsequent long-context training. Specifically, the model is initially
trained with a context length of 4096 tokens, and then the training is transferred to a context length of
32768 tokens. D

hzhwcmhf

Qwen org 3 days ago

@huu-ontocord As described in the paper, we use an intermediate version of Qwen2.5 Base models. It reverts to a state several billion tokens before finishing the 32k training of Qwen2.5-7/14B, prior to the learning rate becoming too small.

huu-ontocord

3 days ago

That makes sense. Thank you very much for your answer!

huu-ontocord changed discussion status to closed 3 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment