Don't be surprised when it's challenging to bring in the full reasoning strength of a reason-heavy prose model like Qwenvergence v12-DS into a high IFEVAL model like Lamarck or Virtuoso Small v2. That's a lot of work to get right, because IFEVAL, precise reasoning, and prose quality are often in tension against each other. Gaining as much as this did is really respectable, and fine-tuning it makes it a more stable base for the coming iterations.