crestf411 commited on
Commit
251cac3
·
verified ·
1 Parent(s): 0ec69e1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - not-for-all-audiences
5
+ - mergekit
6
+ datasets:
7
+ - crestf411/LimaRP-DS
8
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
9
+ - anthracite-org/c2_logs_32k_mistral-v3_v1.2_no_system
10
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal-no-system
11
+ - anthracite-org/kalo-opus-instruct-3k-filtered-no-system
12
+ - anthracite-org/nopm_claude_writing_fixed
13
+ base_model:
14
+ - mistralai/Mistral-Nemo-Base-2407
15
+ ---
16
+
17
+ ![slush.jpg](https://huggingface.co/crestf411/L3.1-8B-Slush/resolve/main/slush.jpg?)
18
+
19
+ ([GGUFs](https://huggingface.co/crestf411/MN-Slush-gguf))
20
+
21
+ **Slush** is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge.
22
+
23
+ This is still early stage. As always, feedback is welcome, and begone if you demand perfection.
24
+
25
+ The second stage, like the *Sunfall* series, follows the Silly Tavern preset (Mistral V2 & V3, though V3-Tekken works fine), so ymmv in particular if you use some other tool and/or preset.
26
+
27
+ **Parameter suggestions:**
28
+
29
+ I did all my testing with temp 1, min-p 0.1, DRY 0.8.
30
+
31
+ **Training details:**
32
+
33
+ * Stage 1 (continued pretraining)
34
+ * Target: mistralai/Mistral-Nemo-Base-2407 (resulting LoRA merged into mistralai/Mistral-Nemo-Instruct-2407)
35
+ * LoRA dropout 0.5 ([motivation](https://arxiv.org/abs/2403.00946))
36
+ * LoRA rank 64, alpha 128 ([motivation](https://arxiv.org/abs/2410.21228))
37
+ * LR cosine 4e-6
38
+ * [LoRA+](https://arxiv.org/abs/2402.12354) with LR Ratio: 15
39
+ * Context size: 16384
40
+ * Gradient accumulation steps: 4
41
+ * Epochs: 1
42
+ * Stage 2 (fine tune)
43
+ * Target: Stage 1 model
44
+ * LoRA dropout 0.5
45
+ * LoRA rank 32, alpha 64
46
+ * LR cosine 5e-6 (min 5e-7)
47
+ * [LoRA+](https://arxiv.org/abs/2402.12354) with LR Ratio: 15
48
+ * Context size: 16384
49
+ * Gradient accumulation steps: 4
50
+ * Epochs: 2
51
+
52
+ ## Merge Details
53
+ ### Merge Method
54
+
55
+ This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using mistralai/Mistral-Nemo-Base-2407 as a base.
56
+
57
+ ### Configuration
58
+
59
+ The following YAML configuration was used to produce this model:
60
+
61
+ ```yaml
62
+ models:
63
+ - model: stage1-on-instruct
64
+ parameters:
65
+ weight: 1
66
+ density: 1
67
+ - model: stage2-on-stage1
68
+ parameters:
69
+ weight: 0.7
70
+ density: 1
71
+ - model: mistralai/Mistral-Nemo-Instruct-2407
72
+ parameters:
73
+ weight: 1
74
+ density: 1
75
+ merge_method: ties
76
+ base_model: mistralai/Mistral-Nemo-Base-2407
77
+ parameters:
78
+ weight: 1
79
+ density: 1
80
+ normalize: true
81
+ int8_mask: true
82
+ tokenizer_source: mistralai/Mistral-Nemo-Instruct-2407
83
+ dtype: bfloat16
84
+ ```