File size: 7,185 Bytes
2c2ab80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
base_model:
- arcee-ai/Virtuoso-Small-v2
- sometimesanotion/Qwenvergence-14B-v3-Prose
- sthenno/tempesthenno-ppo-ckpt40
- CultriX/Enhanced-TIES-Base-v1
library_name: transformers
tags:
- mergekit
- merge
---
# merge
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
## Merge Details
### Merge Method
This model was merged using the [Linear DELLA](https://arxiv.org/abs/2406.11617) merge method using [CultriX/Enhanced-TIES-Base-v1](https://huggingface.co/CultriX/Enhanced-TIES-Base-v1) as a base.
### Models Merged
The following models were included in the merge:
* [arcee-ai/Virtuoso-Small-v2](https://huggingface.co/arcee-ai/Virtuoso-Small-v2)
* [sometimesanotion/Qwenvergence-14B-v3-Prose](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v3-Prose)
* [sthenno/tempesthenno-ppo-ckpt40](https://huggingface.co/sthenno/tempesthenno-ppo-ckpt40)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
name: SuperMerge-LayeredTIES-v1
merge_method: della_linear
base_model: CultriX/Enhanced-TIES-Base-v1 # Referencing the TIES base model defined below (now inlined)
tokenizer_source: base
dtype: float32
out_dtype: bfloat16
parameters:
int8_mask: true
normalize: true
rescale: false
t: [0.1, 0.3, 0.7, 0.7, 0.4, 0.2]
slices:
- sources:
- model: CultriX/Enhanced-TIES-Base-v1 # Referencing inlined TIES base
layer_range: [0, 8]
parameters:
weight: 0.7
- model: arcee-ai/Virtuoso-Small-v2
layer_range: [0, 8]
parameters:
weight: 0.3
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [0, 8]
parameters:
weight: 0.0
- model: sometimesanotion/Qwenvergence-14B-v3-Prose
layer_range: [0, 8]
parameters:
weight: 0.0
- sources:
- model: CultriX/Enhanced-TIES-Base-v1 # Referencing inlined TIES base
layer_range: [8, 16]
parameters:
weight: 0.4
- model: arcee-ai/Virtuoso-Small-v2
layer_range: [8, 16]
parameters:
weight: 0.3
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [8, 16]
parameters:
weight: 0.3
- model: sometimesanotion/Qwenvergence-14B-v3-Prose
layer_range: [8, 16]
parameters:
weight: 0.0
- sources:
- model: CultriX/Enhanced-TIES-Base-v1 # Referencing inlined TIES base
layer_range: [16, 24]
parameters:
weight: 0.2
- model: arcee-ai/Virtuoso-Small-v2
layer_range: [16, 24]
parameters:
weight: 0.2
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [16, 24]
parameters:
weight: 0.5
- model: sometimesanotion/Qwenvergence-14B-v3-Prose
layer_range: [16, 24]
parameters:
weight: 0.1
- sources:
- model: CultriX/Enhanced-TIES-Base-v1 # Referencing inlined TIES base
layer_range: [24, 32]
parameters:
weight: 0.25
- model: arcee-ai/Virtuoso-Small-v2
layer_range: [24, 32]
parameters:
weight: 0.1
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [24, 32]
parameters:
weight: 0.4
- model: sometimesanotion/Qwenvergence-14B-v3-Prose
layer_range: [24, 32]
parameters:
weight: 0.25
- sources:
- model: CultriX/Enhanced-TIES-Base-v1 # Referencing inlined TIES base
layer_range: [32, 40]
parameters:
weight: 0.4
- model: arcee-ai/Virtuoso-Small-v2
layer_range: [32, 40]
parameters:
weight: 0.0
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [32, 40]
parameters:
weight: 0.2
- model: sometimesanotion/Qwenvergence-14B-v3-Prose
layer_range: [32, 40]
parameters:
weight: 0.4
- sources:
- model: CultriX/Enhanced-TIES-Base-v1 # Referencing inlined TIES base
layer_range: [40, 48]
parameters:
weight: 0.6
- model: arcee-ai/Virtuoso-Small-v2
layer_range: [40, 48]
parameters:
weight: 0.0
- model: sthenno/tempesthenno-ppo-ckpt40
layer_range: [40, 48]
parameters:
weight: 0.1
- model: sometimesanotion/Qwenvergence-14B-v3-Prose
layer_range: [40, 48]
parameters:
weight: 0.3
# Commentary:
# =============================================================================
# SuperMerge-LayeredTIES-v1 Commentary:
#
# This configuration combines the strengths of both Enhanced-LayeredSlerp-v1 and SuperMerge-Enhanced-v1.
# It leverages the robust foundation of a TIES-merged base model (Enhanced-TIES-Base-v1) and applies
# the layer-wise module approach and fine-grained weight control from SuperMerge-Enhanced-v1 in a SLERP merge.
#
# Key Features:
# - TIES-Merged Base Foundation: Uses 'Enhanced-TIES-Base-v1' as the base model for the SLERP merge.
# This TIES base provides a selectively merged and potentially more efficient starting point, incorporating
# strengths from multiple models (Virtuoso, Phi-4, Qwenvergence, DeepSeek) with density control.
#
# - Layer-wise Module Integration in SLERP: Maintains the module-based slice structure from SuperMerge-Enhanced-v1.
# The SLERP merge now combines the TIES-merged base with specialized modules for Reasoning, IFEval, and MATH/Knowledge
# at different layer ranges, using explicit weights for fine-grained control.
#
# - Benchmark-Driven Iterative Weight Tuning: The configuration is designed to be optimized through a
# benchmark-driven iterative weight tuning process (as described in the refined SuperMerge-Enhanced-v1 approach).
# The initial weights provided are starting points and need to be systematically tuned based on benchmark results.
#
# Tuning Process (Same as Refined SuperMerge-Enhanced-v1):
# 1. Initial Benchmarking: Run a full benchmark suite.
# 2. Performance Analysis: Examine per-benchmark scores and compare to source models.
# 3. Targeted Weight Adjustments: Adjust layer weights based on performance analysis (e.g., increase IFEval module weight
# in early layers if IFEval is weak).
# 4. Iterate: Repeat steps 1-3. Make small, incremental adjustments in each iteration.
#
# Rationale:
# - By using a TIES-merged base, we aim to create a more robust and potentially efficient foundation for the SLERP merge.
# - The layer-wise module approach and fine-grained weights in SLERP still allow for precise control over the blending
# of specialized capabilities at different network depths, building upon the solid TIES base.
# - The emphasis on a benchmark-driven iterative weight tuning process remains crucial for achieving optimal performance.
#
# Next Steps:
# - Implement this configuration using MergeKit.
# - Run initial benchmarks to establish a baseline.
# - Begin the iterative benchmark-driven weight tuning process to optimize performance.
# =============================================================================
```
|