6 days ago

Great work!

I think some of the weights have invalid sizes? I get following errors when loading with transformers:

ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.

I resized them and put on tmfi-us/Progenitor-V5-Final-LLaMa-70B.

Tarek07

Tarek's Lab org 6 days ago

•

edited 5 days ago

Oh wow thanks for this. I have no clue what is the reason, but despite the error I was still able to use and even quant this model?! Testing the fixed version against this one gave me very different outputs too. If anyone understands this I would love to hear about it?

y-ryan

5 days ago

•

edited 5 days ago

@Tarek07 hmm what quant tool did you use?

I tried to use following tools against this one, and all failed due to the shape error that I mentioned:

lm-evaluation-harness
vllm
sglang
AutoAWQ
transformer (using AutoModelForCausalLM)

Following is lm-evaluation-harness result that I ran against V3.3 and V5 (fixed one), and despite you mentioned about the output being different, evaluation result seems promising?

V3.3

Model	Tasks	Version	Filter	n-shot	Metric		Value		Stderr
Progenitor-V3.3	leaderboard	N/A
Progenitor-V3.3	- leaderboard_bbh	N/A
Progenitor-V3.3	- leaderboard_bbh_boolean_expressions	1	none	3	acc_norm	↑	0.9520	±	0.0135
Progenitor-V3.3	- leaderboard_bbh_causal_judgement	1	none	3	acc_norm	↑	0.6364	±	0.0353
Progenitor-V3.3	- leaderboard_bbh_date_understanding	1	none	3	acc_norm	↑	0.7080	±	0.0288
Progenitor-V3.3	- leaderboard_bbh_disambiguation_qa	1	none	3	acc_norm	↑	0.7240	±	0.0283
Progenitor-V3.3	- leaderboard_bbh_formal_fallacies	1	none	3	acc_norm	↑	0.7880	±	0.0259
Progenitor-V3.3	- leaderboard_bbh_geometric_shapes	1	none	3	acc_norm	↑	0.4480	±	0.0315
Progenitor-V3.3	- leaderboard_bbh_hyperbaton	1	none	3	acc_norm	↑	0.6080	±	0.0309
Progenitor-V3.3	- leaderboard_bbh_logical_deduction_five_objects	1	none	3	acc_norm	↑	0.6080	±	0.0309
Progenitor-V3.3	- leaderboard_bbh_logical_deduction_seven_objects	1	none	3	acc_norm	↑	0.6200	±	0.0308
Progenitor-V3.3	- leaderboard_bbh_logical_deduction_three_objects	1	none	3	acc_norm	↑	0.9440	±	0.0146
Progenitor-V3.3	- leaderboard_bbh_movie_recommendation	1	none	3	acc_norm	↑	0.8640	±	0.0217
Progenitor-V3.3	- leaderboard_bbh_navigate	1	none	3	acc_norm	↑	0.6440	±	0.0303
Progenitor-V3.3	- leaderboard_bbh_object_counting	1	none	3	acc_norm	↑	0.6440	±	0.0303
Progenitor-V3.3	- leaderboard_bbh_penguins_in_a_table	1	none	3	acc_norm	↑	0.7192	±	0.0373
Progenitor-V3.3	- leaderboard_bbh_reasoning_about_colored_objects	1	none	3	acc_norm	↑	0.8400	±	0.0232
Progenitor-V3.3	- leaderboard_bbh_ruin_names	1	none	3	acc_norm	↑	0.8760	±	0.0209
Progenitor-V3.3	- leaderboard_bbh_salient_translation_error_detection	1	none	3	acc_norm	↑	0.7120	±	0.0287
Progenitor-V3.3	- leaderboard_bbh_snarks	1	none	3	acc_norm	↑	0.6966	±	0.0346
Progenitor-V3.3	- leaderboard_bbh_sports_understanding	1	none	3	acc_norm	↑	0.9400	±	0.0151
Progenitor-V3.3	- leaderboard_bbh_temporal_sequences	1	none	3	acc_norm	↑	1.0000	±	0
Progenitor-V3.3	- leaderboard_bbh_tracking_shuffled_objects_five_objects	1	none	3	acc_norm	↑	0.2960	±	0.0289
Progenitor-V3.3	- leaderboard_bbh_tracking_shuffled_objects_seven_objects	1	none	3	acc_norm	↑	0.3240	±	0.0297
Progenitor-V3.3	- leaderboard_bbh_tracking_shuffled_objects_three_objects	1	none	3	acc_norm	↑	0.3560	±	0.0303
Progenitor-V3.3	- leaderboard_bbh_web_of_lies	1	none	3	acc_norm	↑	0.5960	±	0.0311
Progenitor-V3.3	- leaderboard_gpqa	N/A
Progenitor-V3.3	- leaderboard_gpqa_diamond	1	none	0	acc_norm	↑	0.3687	±	0.0344
Progenitor-V3.3	- leaderboard_gpqa_extended	1	none	0	acc_norm	↑	0.4634	±	0.0214
Progenitor-V3.3	- leaderboard_gpqa_main	1	none	0	acc_norm	↑	0.4397	±	0.0235
Progenitor-V3.3	- leaderboard_ifeval	3	none	0	inst_level_loose_acc	↑	0.8885	±	N/A
Progenitor-V3.3			none	0	inst_level_strict_acc	↑	0.8705	±	N/A
Progenitor-V3.3			none	0	prompt_level_loose_acc	↑	0.8373	±	0.0159
Progenitor-V3.3			none	0	prompt_level_strict_acc	↑	0.8133	±	0.0168
Progenitor-V3.3	- leaderboard_math_hard	N/A
Progenitor-V3.3	- leaderboard_math_algebra_hard	2	none	4	exact_match	↑	0.1954	±	0.0227
Progenitor-V3.3	- leaderboard_math_counting_and_prob_hard	2	none	4	exact_match	↑	0.1951	±	0.0359
Progenitor-V3.3	- leaderboard_math_geometry_hard	2	none	4	exact_match	↑	0.0303	±	0.0150
Progenitor-V3.3	- leaderboard_math_intermediate_algebra_hard	2	none	4	exact_match	↑	0.0393	±	0.0116
Progenitor-V3.3	- leaderboard_math_num_theory_hard	2	none	4	exact_match	↑	0.1818	±	0.0312
Progenitor-V3.3	- leaderboard_math_prealgebra_hard	2	none	4	exact_match	↑	0.2746	±	0.0322
Progenitor-V3.3	- leaderboard_math_precalculus_hard	2	none	4	exact_match	↑	0.0222	±	0.0127
Progenitor-V3.3	- leaderboard_mmlu_pro	0.1	none	5	acc	↑	0.5505	±	0.0045
Progenitor-V3.3	- leaderboard_musr	N/A
Progenitor-V3.3	- leaderboard_musr_murder_mysteries	1	none	0	acc_norm	↑	0.5800	±	0.0313
Progenitor-V3.3	- leaderboard_musr_object_placements	1	none	0	acc_norm	↑	0.2773	±	0.0280
Progenitor-V3.3	- leaderboard_musr_team_allocation	1	none	0	acc_norm	↑	0.5400	±	0.0316

V5

Model	Tasks	Version	Filter	n-shot	Metric		Value		Stderr
Progenitor-V5	leaderboard	N/A
Progenitor-V5	- leaderboard_bbh	N/A
Progenitor-V5	- leaderboard_bbh_boolean_expressions	1	none	3	acc_norm	↑	0.9520	±	0.0135
Progenitor-V5	- leaderboard_bbh_causal_judgement	1	none	3	acc_norm	↑	0.6471	±	0.0350
Progenitor-V5	- leaderboard_bbh_date_understanding	1	none	3	acc_norm	↑	0.7040	±	0.0289
Progenitor-V5	- leaderboard_bbh_disambiguation_qa	1	none	3	acc_norm	↑	0.7200	±	0.0285
Progenitor-V5	- leaderboard_bbh_formal_fallacies	1	none	3	acc_norm	↑	0.7880	±	0.0259
Progenitor-V5	- leaderboard_bbh_geometric_shapes	1	none	3	acc_norm	↑	0.4560	±	0.0316
Progenitor-V5	- leaderboard_bbh_hyperbaton	1	none	3	acc_norm	↑	0.6120	±	0.0309
Progenitor-V5	- leaderboard_bbh_logical_deduction_five_objects	1	none	3	acc_norm	↑	0.6200	±	0.0308
Progenitor-V5	- leaderboard_bbh_logical_deduction_seven_objects	1	none	3	acc_norm	↑	0.6240	±	0.0307
Progenitor-V5	- leaderboard_bbh_logical_deduction_three_objects	1	none	3	acc_norm	↑	0.9480	±	0.0141
Progenitor-V5	- leaderboard_bbh_movie_recommendation	1	none	3	acc_norm	↑	0.8680	±	0.0215
Progenitor-V5	- leaderboard_bbh_navigate	1	none	3	acc_norm	↑	0.6400	±	0.0304
Progenitor-V5	- leaderboard_bbh_object_counting	1	none	3	acc_norm	↑	0.6400	±	0.0304
Progenitor-V5	- leaderboard_bbh_penguins_in_a_table	1	none	3	acc_norm	↑	0.7260	±	0.0370
Progenitor-V5	- leaderboard_bbh_reasoning_about_colored_objects	1	none	3	acc_norm	↑	0.8480	±	0.0228
Progenitor-V5	- leaderboard_bbh_ruin_names	1	none	3	acc_norm	↑	0.8720	±	0.0212
Progenitor-V5	- leaderboard_bbh_salient_translation_error_detection	1	none	3	acc_norm	↑	0.6960	±	0.0292
Progenitor-V5	- leaderboard_bbh_snarks	1	none	3	acc_norm	↑	0.7022	±	0.0344
Progenitor-V5	- leaderboard_bbh_sports_understanding	1	none	3	acc_norm	↑	0.9440	±	0.0146
Progenitor-V5	- leaderboard_bbh_temporal_sequences	1	none	3	acc_norm	↑	1.0000	±	0
Progenitor-V5	- leaderboard_bbh_tracking_shuffled_objects_five_objects	1	none	3	acc_norm	↑	0.3080	±	0.0293
Progenitor-V5	- leaderboard_bbh_tracking_shuffled_objects_seven_objects	1	none	3	acc_norm	↑	0.3360	±	0.0299
Progenitor-V5	- leaderboard_bbh_tracking_shuffled_objects_three_objects	1	none	3	acc_norm	↑	0.3520	±	0.0303
Progenitor-V5	- leaderboard_bbh_web_of_lies	1	none	3	acc_norm	↑	0.6160	±	0.0308
Progenitor-V5	- leaderboard_gpqa	N/A
Progenitor-V5	- leaderboard_gpqa_diamond	1	none	0	acc_norm	↑	0.3990	±	0.0349
Progenitor-V5	- leaderboard_gpqa_extended	1	none	0	acc_norm	↑	0.4670	±	0.0214
Progenitor-V5	- leaderboard_gpqa_main	1	none	0	acc_norm	↑	0.4375	±	0.0235
Progenitor-V5	- leaderboard_ifeval	3	none	0	inst_level_loose_acc	↑	0.8981	±	N/A
Progenitor-V5			none	0	inst_level_strict_acc	↑	0.8729	±	N/A
Progenitor-V5			none	0	prompt_level_loose_acc	↑	0.8503	±	0.0154
Progenitor-V5			none	0	prompt_level_strict_acc	↑	0.8170	±	0.0166
Progenitor-V5	- leaderboard_math_hard	N/A
Progenitor-V5	- leaderboard_math_algebra_hard	2	none	4	exact_match	↑	0.2020	±	0.0229
Progenitor-V5	- leaderboard_math_counting_and_prob_hard	2	none	4	exact_match	↑	0.1626	±	0.0334
Progenitor-V5	- leaderboard_math_geometry_hard	2	none	4	exact_match	↑	0.0303	±	0.0150
Progenitor-V5	- leaderboard_math_intermediate_algebra_hard	2	none	4	exact_match	↑	0.0429	±	0.0121
Progenitor-V5	- leaderboard_math_num_theory_hard	2	none	4	exact_match	↑	0.1883	±	0.0316
Progenitor-V5	- leaderboard_math_prealgebra_hard	2	none	4	exact_match	↑	0.2591	±	0.0316
Progenitor-V5	- leaderboard_math_precalculus_hard	2	none	4	exact_match	↑	0.0222	±	0.0127
Progenitor-V5	- leaderboard_mmlu_pro	0.1	none	5	acc	↑	0.5513	±	0.0045
Progenitor-V5	- leaderboard_musr	N/A
Progenitor-V5	- leaderboard_musr_murder_mysteries	1	none	0	acc_norm	↑	0.5600	±	0.0315
Progenitor-V5	- leaderboard_musr_object_placements	1	none	0	acc_norm	↑	0.2852	±	0.0283
Progenitor-V5	- leaderboard_musr_team_allocation	1	none	0	acc_norm	↑	0.5440	±	0.0316

y-ryan

5 days ago

@Tarek07 If you can kindly share what tools did you use to load and run this model (and version), I am happy to try it out on my end and dive deeper!

Tarek07

Tarek's Lab org 5 days ago

•

edited 5 days ago

@y-ryan So for the quant I used 'koboldcpp_tools_19dec' First the 'convert_hf_to_gguf.py' script and then the 'quantize_gguf.exe' to Q8. (this is what I run most models with locally.)

Then as for deployment I used Friendli and the HF Inference Endpoints on Google cloud, both of which worked and gave me good outputs. (I deployed your fixed version in the same way to test some test prompts I use.)

I can certainly see the .safetensor files show: model.layers.0.input_layernorm.weight [1, 8 192]

When it should be: model.layers.0.input_layernorm.weight [8 192]

I just have no idea why it works and works well (from the test prompts at least)?

The GGUF Q8 quant also seemed to work well (only did one RP to test.)

**On some further testing with my GGUF qaunt there are glaring logic issues. It is truly broken. **

y-ryan

5 days ago

•

edited 5 days ago

@Tarek07 checking out the old version of convert_hf_to_gguf.py from koboldcpp, I found following:

    def prepare_tensors(self):
        ...
        for name, data_torch in chain(self.generate_extra_tensors(), self.get_tensors()):
            ...
            for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
                data = data_torch.squeeze().numpy()
                ...

Above code effectively converts [1, 8192] to [8192] by using torch.squeeze().
This code has been then updated (by this commit) not to use torch.squeeze() no more:

-                data = data_torch.squeeze().numpy()
+                # TODO: why do we squeeze here?
+                # data = data_torch.squeeze().numpy()
+                data = data_torch.numpy()

If my assumption is correct, running the latest code will break.

I used torch.Tensor.view() to reshape from [1, 8192] to [8192]. While I believe torch.Tensor.view() and torch.squeeze() will basically have same effect in this case, I will try to reshape again using torch.squeeze() and re-upload the model to see if it makes any difference.

Tarek07

Tarek's Lab org 5 days ago

@Tarek07 checking out the old version of convert_hf_to_gguf.py from koboldcpp, I found following:
    def prepare_tensors(self):
        ...
        for name, data_torch in chain(self.generate_extra_tensors(), self.get_tensors()):
            ...
            for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
                data = data_torch.squeeze().numpy()
                ...
Above code effectively converts [1, 8192] to [8192] by using torch.squeeze().
This code has been then updated (by this commit) not to use torch.squeeze() no more:
-                data = data_torch.squeeze().numpy()
+                # TODO: why do we squeeze here?
+                # data = data_torch.squeeze().numpy()
+                data = data_torch.numpy()
If my assumption is correct, running the latest code will break.

I used torch.Tensor.view() to reshape from [1, 8192] to [8192]. While I believe torch.Tensor.view() and [torch.squeeze()] will basically have same effect in this case, I will try to reshape again using [torch.squeeze()] and re-upload the model to see if it makes any difference.

Aha! Good catch.

It is broken. On the gguf quant I tried another more complex character card and it had glaring logic issues which other Progenitor models didnt have. So I am convinced somethign broke. Its weird because when I deployed the models on the endpoints the outputs were stellar? Unless the endpoints also fix the error somehow? Regardless I will redo this model just in case. I really appreciate letting me know about the issue I would never have seen it!

y-ryan

5 days ago

@Tarek07
I ran following code:

import os
import torch
from safetensors.torch import load_file, save_file

MODEL_DIR = "/root/.cache/huggingface/hub/models--Tarek07--Progenitor-V5-Final-LLaMa-70B/snapshots/2364d825a8ea4414d3a661b9cf2fa0f29af71756"

def check_shard(shard_path):
    data = load_file(shard_path)

    for key, tensor in data.items():
        if list(tensor.shape) == [1, 8192]:
            print(f"  Checking {key} in {os.path.basename(shard_path)} with size {tensor.shape}")
            viewed_tensor = tensor.view(8192)
            squeezed_tensor = tensor.squeeze()
            print(f"    Equality: {torch.equal(viewed_tensor, squeezed_tensor)}")

def main():
    # Look for .safetensors files in MODEL_DIR
    for filename in sorted(os.listdir(MODEL_DIR)):
        if filename.endswith(".safetensors"):
            shard_path = os.path.join(MODEL_DIR, filename)

            print(f"Processing: {shard_path}")
            check_shard(shard_path)

if __name__ == "__main__":
    main()

and confirmed that tensor.view() and tensor.squeeze() indeed returned identical Tensors. Thus I'm not reuploading the tensor.squeeze() version to my repo.

Glad my debugging was helpful, and thanks again for the great work! Please feel free to let me know at any time if you need other help. I will be around to test the redo-ed version out when it's ready.
I will update the readme on my repo to let others know that you will be redoing and we all are looking forward for it!

Tarek07

Tarek's Lab org 5 days ago

@y-ryan Much appreciated!

TareksLab
/

Progenitor-V5-Broken-LLaMa-70B

Invalid shape `[1, 8192]`

V3.3

V5