Bartowski! 0.0!!!! You are on double-secret probation for this jinja error!
Just joking and thanks dude for all you do for the community and opening AI and this bizarro world up for non-rocket scientists, but I am getting a jinja error in LM-studio with your Q5KL quant? Here is the error: "Failed to parse Jinja template: Parser Error: Expected closing statement token. OpenSquareBracket !== CloseStatement." Can you see whats up with this and I am gonna try Iq4NL with my 4080 right now too since I still don't know wtf to pick, or which is better but thanks!!
Same issue here, I tried a few GGUFs including the official one but they all fail with that error in LM Studio (had a similar issue with arcee-blitz)
You can either use the official lmstudio-community ones here: https://huggingface.co/lmstudio-community/QwQ-32B-GGUF
or you can apply the fix from here: https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/479#issuecomment-2701947624
Thanks, and since you use iMatrix and LM-studio quants don't I stick to yours but my gratiude to the devs of that program is due as well. I hope u got the Animal House reference and idk why your name always makes me think of it? But it appears to be working now except he made random tool calls now?- in LM-Studio and not saying that's your fault but for others who read this just be careful and I will test it further and follow up but my sincere gratitude for contributing the way you do! Also did u ever figure out what was going on with your 3bit quants?
since you use iMatrix and LM-studio quants don't I stick to yours
Same - and just curious since you're (Bartowski) the official curator for their models, is there any reason they don't?
iMatrix always seem to be a straight-up upgrade.
I agree, but there was circumstantial evidence (that I don't agree with) showing degradation in performance when imatrix was applied
The lack of imatrix also makes the results more "pure" and impossible to accuse of dataset biasing
Again, I still hold that imatrix is universally better, but there was enough to convince to switch to static. Plus static is way faster to make so the lmstudio ones can be first out the door while I crunch the numbers for my own behind the scenes
I agree, but there was circumstantial evidence (that I don't agree with) showing degradation in performance when imatrix was applied
The lack of imatrix also makes the results more "pure" and impossible to accuse of dataset biasing
Again, I still hold that imatrix is universally better, but there was enough to convince to switch to static. Plus static is way faster to make so the lmstudio ones can be first out the door while I crunch the numbers for my own behind the scenes
Are there many reasoning examples in the imatrix dataset? I'm wondering if the issue is to do with using non reasoning datasets that might not work so well with a reasoning model. According to R1:
To create an imatrix dataset (importance matrix dataset) for quantizing a reasoning-focused LLM, focus on capturing diverse, high-quality examples of reasoning tasks. The goal is to help the quantization process identify and preserve critical weights/activations that enable logical, mathematical, and analytical reasoning. Here's a step-by-step guide:
- Key Requirements for the Dataset
Diverse Reasoning Tasks: Cover multiple types (logical, mathematical, causal, analytical, commonsense).
Complexity Levels: Include easy, medium, and hard examples.
Balanced Distribution: Ensure no single task type dominates.
High-Quality Explanations: Include step-by-step reasoning (chain-of-thought).
Domain Alignment: Match the model’s intended use case (e.g., math, science, coding).
- Example Data Structure
Each entry should include:
Input: A question/problem requiring reasoning.
Output: A detailed, structured answer (e.g., chain-of-thought).
Metadata: Task type, complexity, and domain.
This doesn't really apply to imatrix
The dataset is better to be diverse and random than targetted and specific
Check this discussion posted today:
https://www.reddit.com/r/LocalLLaMA/comments/1j9ih6e/english_k_quantization_of_llms_does_not/
If language of the imatrix file doesn't affect the output, then I think it would be logical to conclude that structure also is irrelevant
That said, the one thing that MAY be important is chunk size. Since default imatrix is 512 token chunks, it's possible this would adversely affect long context, and reasoning models obviously tend to have extremely long context. There's an argument to be made that a stronger beginning to the reasoning is more important than having the same quality throughout, but that's very debatable in both directions and difficult to quantify..
I'm hoping to do more experiments with chunk sizes, and especially combining chunk sizes to see if we can get better overall results, but that's for the future ideally when multiple chunk size support is improved