Nexesenex commited on
Commit
42ab0ba
·
verified ·
1 Parent(s): 8ded8dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -1
README.md CHANGED
@@ -10,6 +10,45 @@ tags:
10
  license: llama3.2
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  GGUF static quantizations (Thanks!) :
15
 
@@ -19,7 +58,7 @@ GGUF custom quantizations:
19
 
20
  https://huggingface.co/Nexesenex/Llama_3.2_3b_Kermes_v2.1-iMat-CQ-GGUF
21
 
22
-
23
  # merge
24
 
25
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
10
  license: llama3.2
11
  ---
12
 
13
+ # about
14
+
15
+ The Kermes series is my second attempt at making merges, after the Kostume series.
16
+
17
+ On the Kostume series started on the 11/02/0205 I tried to make a triple stock merge of 3 intermediary stock merges of a dozen of model or so.
18
+ This, to see if I could pile up their abilities.
19
+ Not bad, but nothing special about it, it's a bit hard for me to judge at 3b.
20
+
21
+ On the Kermes series started the day after, I defined a simpler approach:
22
+
23
+ - Perplexity is the main constraint. Usual L3.2 3b finetunes are around 10.5-11 ppl512wikieng, Hermes is around 9.5.
24
+ - I also measure in French and Serbian to observe the variances.
25
+
26
+ - Arc Challenge and Easy are the second constraint to judge its basic logics.
27
+ - Usual L3.2 3b finetunes hit 40 and 60-65 respectively, Hermes3 hits 47+ and 70+.
28
+
29
+ - Lack of censorship. I always keep in mind to pick models compatible with that AMAP.
30
+ - This, may it be through the picked models' abliteration or the datasets they use.
31
+
32
+ - And of course, the test, both In Kobold/Croco.CPP (spamming very offensive requests), and in ST (a 10k prompt with a big lorebook).
33
+
34
+ Kermes series 2 is basically a stock merge on the top of another.
35
+ - The goal was to maintain as much the qualities of the models used, so I stay on 1+2 models for the first merge, and 1+2 for the second as well.
36
+
37
+ For V2.1 :
38
+ - First, DarkHermes as the base, LlamaLoi as the "stabilizator", and Hermes Abliterated.
39
+ - That triplet kept the strong benchs of DarkHermes and even.. improved them a bit.
40
+ - Second, That Kermes 0.2 served as a base, with.. Evil Aplaca as a wild card (very good Arcs and nasty dataset), and Dophin 3.0 for a quality addition.
41
+
42
+ And bingo. Perplexity goes down still, ARC remain stable, it's quite unhinged still, and.. quite coherent, event at 10k+ context.
43
+
44
+ I will probably replicate that recipes a bit in the future, first to try to improve Kermes 3b.
45
+ And then, go on 8b for the next.. arc of this adventure.
46
+
47
+ Kudos go to the model authors, and to the Arcee / MergeKit folks, as well as to HF hosting the MergeKit App.
48
+ Also a big-up to SteelSkull, observing him cooking Nevoria decided me to try to make some merges by myself.
49
+
50
+ ---
51
+ # quantizations
52
 
53
  GGUF static quantizations (Thanks!) :
54
 
 
58
 
59
  https://huggingface.co/Nexesenex/Llama_3.2_3b_Kermes_v2.1-iMat-CQ-GGUF
60
 
61
+ ---
62
  # merge
63
 
64
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).