Did the base 1M start from qwen2.5-7b?
Did you extend the qwen 2.5-7b to 1M and then did instruct? Is there any relation to qwen 2.5 7b
Specifically in the paper:
Qwen2.5-1M series are developed based on Qwen2.5 models (Yang et al., 2025) and support context
length up to 1M tokens.
And this:
The first two stages are similar to those of other Qwen2.5 models, where we directly use an intermediate
version from Qwen2.5 Base models for subsequent long-context training. Specifically, the model is initially
trained with a context length of 4096 tokens, and then the training is transferred to a context length of
32768 tokens. D
@huu-ontocord As described in the paper, we use an intermediate version of Qwen2.5 Base models. It reverts to a state several billion tokens before finishing the 32k training of Qwen2.5-7/14B, prior to the learning rate becoming too small.
That makes sense. Thank you very much for your answer!