Upon further testing I found some logic issues!! WEIGHTS ARE BROKEN

//User y-ryan discovered an issue where the model had invalid tensor.Shape for weights ([1, 8192]), raising following errors when loading with transformers, and fixed it here: tmfi-us/Progenitor-V5-Final-LLaMa-70B I have no clue what is the reason, but despite that I was still able to use and even quant this model?! Testing the fixed version and this gave me different outputs too, with this version's stuff being good?. If anyone understands this I would love to hear about it.//

This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Linear DELLA merge method using meta-llama/Llama-3.3-70B-Instruct as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Sao10K/L3.1-70B-Hanami-x1
    parameters:
      weight: 0.20
      density: 0.7
  - model: Sao10K/70B-L3.3-Cirrus-x1
    parameters:
      weight: 0.20
      density: 0.7
  - model: SicariusSicariiStuff/Negative_LLAMA_70B
    parameters:
      weight: 0.20
      density: 0.7
  - model: TheDrummer/Anubis-70B-v1
    parameters:
      weight: 0.20
      density: 0.7
  - model: EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
    parameters:
      weight: 0.20
      density: 0.7
merge_method: della_linear
base_model: meta-llama/Llama-3.3-70B-Instruct
parameters:
  epsilon: 0.2
  lambda: 1.1
  int8_mask: true
dtype: float32
out_dtype: bfloat16
tokenizer:
 source: SicariusSicariiStuff/Negative_LLAMA_70B

Tarek07
/

Progenitor-V5-Final-LLaMa-70B

merge

Merge Details

Merge Method

Models Merged

Configuration

Model tree for Tarek07/Progenitor-V5-Final-LLaMa-70B