parameters guide
samplers guide
model generation
role play settings
quant selection
arm quants
iq quants vs q quants
optimal model setting
gibberish fixes
coherence
instructing following
quality generation
chat settings
quality settings
llamacpp server
llamacpp
lmstudio
sillytavern
koboldcpp
backyard
ollama
model generation steering
steering
model generation fixes
text generation webui
ggufs
exl2
full precision
quants
imatrix
neo imatrix
Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,11 @@ The settings discussed in this document can also fix a number of model issues (<
|
|
38 |
- General output quality.
|
39 |
- Role play related issues.
|
40 |
|
41 |
-
Likewise ALL the setting below can also improve model generation and/or general overall "smoothness" / "quality" of model operation
|
|
|
|
|
|
|
|
|
42 |
|
43 |
Even if you are not using my models, you may find this document useful for any model (any quant / full source) available online.
|
44 |
|
@@ -54,8 +58,7 @@ Every parameter, sampler and advanced sampler here affects per token generation
|
|
54 |
|
55 |
This effect is cumulative especially with long output generation and/or multi-turn (chat, role play, COT).
|
56 |
|
57 |
-
Likewise because of how modern AIs/LLMs operate the previously generated (quality) of the tokens generated
|
58 |
-
affect the next tokens generated too.
|
59 |
|
60 |
You will get higher quality operation overall - stronger prose, better answers, and a higher quality adventure.
|
61 |
|
@@ -90,32 +93,45 @@ These parameters/settings are considered both safe and default and in most cases
|
|
90 |
|
91 |
---
|
92 |
|
93 |
-
<B>
|
94 |
|
95 |
-
|
96 |
|
97 |
-
|
98 |
|
99 |
-
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
104 |
|
105 |
-
|
106 |
|
107 |
-
|
108 |
|
109 |
-
|
110 |
|
111 |
-
|
112 |
|
113 |
-
https://github.com/oobabooga/text-generation-webui
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
114 |
|
115 |
-
|
116 |
|
117 |
Note that https://github.com/LostRuins/koboldcpp also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
|
118 |
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
|
120 |
|
121 |
In most cases all llama_cpp settings are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama" and "lmstudio" (as well as other apps too).
|
@@ -126,6 +142,24 @@ https://github.com/ggerganov/llama.cpp
|
|
126 |
|
127 |
(scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
|
128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
129 |
---
|
130 |
|
131 |
CRITICAL NOTES:
|
@@ -168,9 +202,53 @@ Imatrix quants generally improve all quants, and also allow you to use smaller q
|
|
168 |
|
169 |
IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
|
170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
171 |
|
172 |
------------------------------------------------------------------------------
|
173 |
-
PRIMARY PARAMETERS:
|
174 |
------------------------------------------------------------------------------
|
175 |
|
176 |
These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
|
@@ -221,7 +299,7 @@ Similar to top_p, but select instead only the top_k most likely tokens. Higher v
|
|
221 |
|
222 |
Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
|
223 |
|
224 |
-
As this parameter operates in
|
225 |
|
226 |
<B>NOTE - "CORE" Testing with "TEMP":</B>
|
227 |
|
@@ -243,7 +321,7 @@ Then test "at temp" with your prompt(s) to see the MODELS in action. (5-10 gener
|
|
243 |
|
244 |
|
245 |
------------------------------------------------------------------------------
|
246 |
-
PENALITY SAMPLERS:
|
247 |
------------------------------------------------------------------------------
|
248 |
|
249 |
These samplers "trim" or "prune" output in real time.
|
@@ -252,6 +330,8 @@ The longer the generation, the stronger overall effect but that all depends on "
|
|
252 |
|
253 |
For creative use cases, these samplers can alter prose generation in interesting ways.
|
254 |
|
|
|
|
|
255 |
CLASS 4: For these models it is important to activate / set all samplers as noted for maximum quality and control.
|
256 |
|
257 |
<B>PRIMARY:</B>
|
@@ -313,44 +393,32 @@ Generally this is not used.
|
|
313 |
|
314 |
|
315 |
------------------------------------------------------------------------------
|
316 |
-
SECONDARY SAMPLERS / FILTERS:
|
317 |
------------------------------------------------------------------------------
|
318 |
|
|
|
319 |
|
320 |
-
|
321 |
-
|
322 |
-
tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
|
323 |
-
|
324 |
-
Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
|
325 |
-
( https://www.trentonbricken.com/Tail-Free-Sampling/ )
|
326 |
-
|
327 |
-
|
328 |
-
<B>typical</B>
|
329 |
-
|
330 |
-
locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
|
331 |
-
|
332 |
-
If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
|
333 |
|
|
|
334 |
|
335 |
<B>mirostat</B>
|
336 |
|
337 |
-
|
338 |
-
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
|
339 |
|
340 |
<B>mirostat-lr</B>
|
341 |
|
342 |
Mirostat learning rate, parameter eta (default: 0.1) " mirostat_tau "
|
343 |
|
|
|
|
|
344 |
<B>mirostat-ent</B>
|
345 |
|
346 |
Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
|
347 |
|
348 |
-
Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. (https://arxiv.org/abs/2007.14966)
|
349 |
-
|
350 |
-
mirostat_tau: 5-8 is a good value.
|
351 |
-
|
352 |
mirostat_eta: 0.1 is a good value.
|
353 |
|
|
|
354 |
|
355 |
This is the big one ; activating this will help with creative generation. It can also help with stability. Also note which
|
356 |
samplers are disabled/ignored here, and that "mirostat_eta" is a learning rate.
|
@@ -360,9 +428,9 @@ This is both a sampler (and pruner) and enhancement all in one.
|
|
360 |
It also has two modes of generation "1" and "2" - test both with 5-10 generations of the same prompt. Make adjustments, and repeat.
|
361 |
|
362 |
|
363 |
-
|
364 |
|
365 |
-
|
366 |
|
367 |
|
368 |
<B>dynatemp-range</B>
|
@@ -379,11 +447,11 @@ Activates Dynamic Temperature. This modifies temperature to range between "dynat
|
|
379 |
|
380 |
This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
|
381 |
|
382 |
-
For
|
383 |
|
384 |
Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
|
385 |
|
386 |
-
To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
|
387 |
|
388 |
1 - Set the "temp" to 1.3 (the regular temp parameter)
|
389 |
|
@@ -394,6 +462,21 @@ To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit
|
|
394 |
This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
|
395 |
|
396 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
397 |
<B>xtc-probability</B>
|
398 |
|
399 |
xtc probability (default: 0.0, 0.0 = disabled)
|
@@ -409,8 +492,6 @@ If 2 or more tokens have probability above this threshold, consider removing all
|
|
409 |
XTC is a new sampler, that adds an interesting twist in generation.
|
410 |
Suggest you experiment with this one, with other advanced samplers disabled to see its affects.
|
411 |
|
412 |
-
|
413 |
-
|
414 |
<B>l, logit-bias TOKEN_ID(+/-)BIAS </B>
|
415 |
|
416 |
modifies the likelihood of token appearing in the completion,
|
@@ -427,17 +508,49 @@ I suggest you get some "bad outputs" ; get the "tokens" (actual number for the "
|
|
427 |
Careful testing is required, as this can have unclear side effects.
|
428 |
|
429 |
|
430 |
-
|
431 |
-
ADVANCED SAMPLERS:
|
432 |
-
|
|
|
|
|
|
|
|
|
433 |
|
434 |
-
|
435 |
|
436 |
https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
|
437 |
|
|
|
|
|
438 |
Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
|
439 |
|
440 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
441 |
|
442 |
For CLASS 3 you can use one, two or both.
|
443 |
|
@@ -454,6 +567,8 @@ You may therefore want to experiment to with dropping the settings (SLOWLY) for
|
|
454 |
|
455 |
<B>DRY:</B>
|
456 |
|
|
|
|
|
457 |
Class 3:
|
458 |
|
459 |
dry_multiplier: .8
|
@@ -473,6 +588,7 @@ dry_base: 1.15 to 1.5
|
|
473 |
|
474 |
<B>QUADRATIC SAMPLING:</B>
|
475 |
|
|
|
476 |
|
477 |
Class 3:
|
478 |
|
@@ -487,14 +603,25 @@ smoothing_factor: 3 to 5 (or higher)
|
|
487 |
smoothing_curve: 1.5 to 2.
|
488 |
|
489 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
490 |
IMPORTANT:
|
491 |
|
492 |
Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
|
493 |
for operation of CLASS 4 models for chat / role play and/or "smoother operation".
|
494 |
|
495 |
-
For Class 3 models, "QUADRATIC" will have a stronger effect than "DRY" relatively speaking.
|
496 |
|
497 |
-
If you use Microstat, keep in mind this will interact with these two advanced samplers too.
|
498 |
|
499 |
Finally:
|
500 |
|
|
|
38 |
- General output quality.
|
39 |
- Role play related issues.
|
40 |
|
41 |
+
Likewise ALL the setting (parameters, samplers and advanced samplers) below can also improve model generation and/or general overall "smoothness" / "quality" of model operation:
|
42 |
+
|
43 |
+
- all parameters and samplers available via LLAMACPP (and most apps that run / use LLAMACPP)
|
44 |
+
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in oobabooga/text-generation-webui including llamacpp_HF loader (allowing a lot more samplers)
|
45 |
+
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in KoboldCPP (including Anti-slop filters)
|
46 |
|
47 |
Even if you are not using my models, you may find this document useful for any model (any quant / full source) available online.
|
48 |
|
|
|
58 |
|
59 |
This effect is cumulative especially with long output generation and/or multi-turn (chat, role play, COT).
|
60 |
|
61 |
+
Likewise because of how modern AIs/LLMs operate the previously generated (quality) of the tokens generated affect the next tokens generated too.
|
|
|
62 |
|
63 |
You will get higher quality operation overall - stronger prose, better answers, and a higher quality adventure.
|
64 |
|
|
|
93 |
|
94 |
---
|
95 |
|
96 |
+
<B>SOURCE FILES for my Models:</B>
|
97 |
|
98 |
+
Source files / Source models of my models are located here (also upper right menu on this page):
|
99 |
|
100 |
+
[ https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be ]
|
101 |
|
102 |
+
You will need the config files to use "llamacpp_HF" loader ("text-generation-webui") [ https://github.com/oobabooga/text-generation-webui ]
|
103 |
|
104 |
+
You can also use the full source in "text-generation-webui" too.
|
105 |
|
106 |
+
As an alternative you can use GGUFs directly in "KOBOLDCPP" without the "config files" and still use almost all the parameters, samplers and advanced samplers.
|
107 |
|
108 |
+
<B>Parameters, Samplers and Advanced Samplers</B>
|
109 |
|
110 |
+
In section 1 a,b, and c, below are all the LLAMA_CPP parameters and samplers.
|
111 |
|
112 |
+
I have added notes below each one for adjustment / enhancement(s) for specific use cases.
|
113 |
|
114 |
+
TEXT-GENERATION-WEBUI
|
115 |
|
116 |
+
In section 2, will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui
|
117 |
+
AND/OR https://github.com/LostRuins/koboldcpp ("KOBOLDCPP").
|
118 |
+
|
119 |
+
The "llamacpp_HF" (for "text-generation-webui") only requires the GGUF you want to use plus a few config files from "source repo" of the model.
|
120 |
+
|
121 |
+
(this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
|
122 |
+
|
123 |
+
This allows access to very advanced samplers in addition to all the parameters / samplers here.
|
124 |
|
125 |
+
KOBOLDCPP:
|
126 |
|
127 |
Note that https://github.com/LostRuins/koboldcpp also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
|
128 |
|
129 |
+
You can use almost all parameters, samplers and advanced samplers using "KOBOLDCPP" without the need to get the source config files (the "llamacpp_HF" step).
|
130 |
+
|
131 |
+
Note: This program has one of the newest samplers called "Anti-slop" which allows phrase/word banning at the generation level.
|
132 |
+
|
133 |
+
OTHER PROGRAMS:
|
134 |
+
|
135 |
Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
|
136 |
|
137 |
In most cases all llama_cpp settings are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama" and "lmstudio" (as well as other apps too).
|
|
|
142 |
|
143 |
(scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
|
144 |
|
145 |
+
DETAILED NOTES ON PARAMETERS, SAMPLERS and ADVANCED SAMPLERS:
|
146 |
+
|
147 |
+
For additional details on these samplers settings (including advanced ones) you may also want to check out:
|
148 |
+
|
149 |
+
https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
|
150 |
+
|
151 |
+
(NOTE: Not all of these "options" are available for GGUFS, including when you use "llamacpp_HF" loader in "text-generation-webui" )
|
152 |
+
|
153 |
+
Additional Links:
|
154 |
+
|
155 |
+
=> DRY => https://github.com/oobabooga/text-generation-webui/pull/5677
|
156 |
+
=> DRY => https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
|
157 |
+
=> DRY => https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
|
158 |
+
=> Samplers (videos) : https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
|
159 |
+
=> Creative Writing -> https://www.reddit.com/r/LocalLLaMA/comments/1c36ieb/comparing_sampling_techniques_for_creative/
|
160 |
+
=> Parameters => https://arxiv.org/html/2408.13586v1
|
161 |
+
=> Stats on some parameters => https://github.com/ZhouYuxuanYX/Benchmarking-and-Guiding-Adaptive-Sampling-Decoding-for-LLMs
|
162 |
+
|
163 |
---
|
164 |
|
165 |
CRITICAL NOTES:
|
|
|
202 |
|
203 |
IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
|
204 |
|
205 |
+
---
|
206 |
+
|
207 |
+
HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)
|
208 |
+
|
209 |
+
1 - Set temp to 0 (zero) and set your basic parameters, and use a prompt to get a "default" generation. A creative prompt will work better here.
|
210 |
+
|
211 |
+
2 - If you want to test basic parameter changes, test ONE at a time, then compare output (answer quality, word choice, sentence size/construction, general output qualities) to your "default" generation.
|
212 |
+
|
213 |
+
3 - Then start testing TWO parameters at a time, and comparing again. Keep in mind parameters (all) interact with each other.
|
214 |
+
|
215 |
+
4 - Samplers -> Reset your basic parameters, (temp still at zero) and test each one of these, one at a time. Then adjust settings, test again.
|
216 |
+
|
217 |
+
5 - Once you have an "idea" of how each affects your "test prompt" , now test at "temp" (not zero). It may take five to ten generation to get a rough idea.
|
218 |
+
|
219 |
+
Yes, testing is a lot of work - but once you get all the parameter(s) and/or sampler(s) dialed in - it is worth it.
|
220 |
+
|
221 |
+
IMPORTANT: Use a "fresh chat" PER TEST (you will contaminate the results otherwise). Never use the same chat for multiple tests -> exception: Regens.
|
222 |
+
|
223 |
+
Keep in mind that parameters, samplers and advanced samplers can affect the model on a per token generation basis AND/OR on a multi-token / phrase / sentence / paragraph
|
224 |
+
and even complete generation basis.
|
225 |
+
|
226 |
+
Everything is cumulative here regardless if the parameter/sampler affects per token or multi-token basis because of how models "look back" to see what was generated in some cases.
|
227 |
+
|
228 |
+
And of course... each model will be different too.
|
229 |
+
|
230 |
+
All that being said, it is a good idea to have specific generation quality "goals" in mind.
|
231 |
+
|
232 |
+
Likewise, at my repo, I post example generations so you can get an idea (but not complete picture) of a model's generation abilities.
|
233 |
+
|
234 |
+
The best way to control generation is STILL with your prompt(s) - including pre-prompts/system role. The latest gen models (and archs) have very strong
|
235 |
+
instruction following so many times better (or just included!) instructions in your prompts can make a world of difference.
|
236 |
+
|
237 |
+
Not sure if the model understands your prompt(s)?
|
238 |
+
|
239 |
+
Ask it ->
|
240 |
+
|
241 |
+
"Check my prompt below and tell me how to make it clearer?" (prompt after this line)
|
242 |
+
|
243 |
+
"For my prompt below, explain the steps you wound take to execute it" (prompt after this line)
|
244 |
+
|
245 |
+
This will help the model fine tune your prompt so IT understands it.
|
246 |
+
|
247 |
+
However sometimes parameters and/or samplers are required to better "wrangle" the model and getting to perform to its maximum potential and/or fine tune it to your use case(s).
|
248 |
+
|
249 |
|
250 |
------------------------------------------------------------------------------
|
251 |
+
Section 1a : PRIMARY PARAMETERS - ALL APPS:
|
252 |
------------------------------------------------------------------------------
|
253 |
|
254 |
These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
|
|
|
299 |
|
300 |
Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
|
301 |
|
302 |
+
As this parameter operates in conjunction with "top-p" and "min-p" all three should be carefully adjusted one at a time.
|
303 |
|
304 |
<B>NOTE - "CORE" Testing with "TEMP":</B>
|
305 |
|
|
|
321 |
|
322 |
|
323 |
------------------------------------------------------------------------------
|
324 |
+
Section 1b : PENALITY SAMPLERS - ALL APPS:
|
325 |
------------------------------------------------------------------------------
|
326 |
|
327 |
These samplers "trim" or "prune" output in real time.
|
|
|
330 |
|
331 |
For creative use cases, these samplers can alter prose generation in interesting ways.
|
332 |
|
333 |
+
Penalty parameters affect both per token and part of OR entire generation (depending on settings / output length).
|
334 |
+
|
335 |
CLASS 4: For these models it is important to activate / set all samplers as noted for maximum quality and control.
|
336 |
|
337 |
<B>PRIMARY:</B>
|
|
|
393 |
|
394 |
|
395 |
------------------------------------------------------------------------------
|
396 |
+
Section 1c : SECONDARY SAMPLERS / FILTERS - ALL APPS:
|
397 |
------------------------------------------------------------------------------
|
398 |
|
399 |
+
In some AI/LLM apps, these may only be available via JSON file modification and/or API.
|
400 |
|
401 |
+
For "text-gen-webui" and "Koboldcpp" these are directly accessible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
402 |
|
403 |
+
i) OVERALL GENERATION CHANGES (affect per token as well as over all generation):
|
404 |
|
405 |
<B>mirostat</B>
|
406 |
|
407 |
+
Use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
|
|
|
408 |
|
409 |
<B>mirostat-lr</B>
|
410 |
|
411 |
Mirostat learning rate, parameter eta (default: 0.1) " mirostat_tau "
|
412 |
|
413 |
+
mirostat_tau: 5-8 is a good value.
|
414 |
+
|
415 |
<B>mirostat-ent</B>
|
416 |
|
417 |
Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
|
418 |
|
|
|
|
|
|
|
|
|
419 |
mirostat_eta: 0.1 is a good value.
|
420 |
|
421 |
+
Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. ( https://arxiv.org/abs/2007.14966 )
|
422 |
|
423 |
This is the big one ; activating this will help with creative generation. It can also help with stability. Also note which
|
424 |
samplers are disabled/ignored here, and that "mirostat_eta" is a learning rate.
|
|
|
428 |
It also has two modes of generation "1" and "2" - test both with 5-10 generations of the same prompt. Make adjustments, and repeat.
|
429 |
|
430 |
|
431 |
+
CLASS 3: models it is suggested to use this to assist with generation (min settings).
|
432 |
|
433 |
+
CLASS 4: models it is highly recommended with Microstat 1 or 2 + mirostat_tau @ 6 to 8 and mirostat_eta at .1 to .5
|
434 |
|
435 |
|
436 |
<B>dynatemp-range</B>
|
|
|
447 |
|
448 |
This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
|
449 |
|
450 |
+
For Koboldcpp a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
|
451 |
|
452 |
Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
|
453 |
|
454 |
+
To set manually (IE: Api, lmstudio, Llamacpp, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
|
455 |
|
456 |
1 - Set the "temp" to 1.3 (the regular temp parameter)
|
457 |
|
|
|
462 |
This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
|
463 |
|
464 |
|
465 |
+
ii) PER TOKEN CHANGES:
|
466 |
+
|
467 |
+
<B>tfs</B>
|
468 |
+
|
469 |
+
Tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
|
470 |
+
|
471 |
+
Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
|
472 |
+
( https://www.trentonbricken.com/Tail-Free-Sampling/ )
|
473 |
+
|
474 |
+
<B>typical</B>
|
475 |
+
|
476 |
+
Locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
|
477 |
+
|
478 |
+
If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
|
479 |
+
|
480 |
<B>xtc-probability</B>
|
481 |
|
482 |
xtc probability (default: 0.0, 0.0 = disabled)
|
|
|
492 |
XTC is a new sampler, that adds an interesting twist in generation.
|
493 |
Suggest you experiment with this one, with other advanced samplers disabled to see its affects.
|
494 |
|
|
|
|
|
495 |
<B>l, logit-bias TOKEN_ID(+/-)BIAS </B>
|
496 |
|
497 |
modifies the likelihood of token appearing in the completion,
|
|
|
508 |
Careful testing is required, as this can have unclear side effects.
|
509 |
|
510 |
|
511 |
+
------------------------------------------------------------------------------------------------------------------------------------------------------------
|
512 |
+
SECTION 2: ADVANCED SAMPLERS - "text-generation-webui" / "KOBOLDCPP":
|
513 |
+
|
514 |
+
Additional Parameters / Samplers, including "DRY", "QUADRATIC" and "ANTI-SLOP".
|
515 |
+
------------------------------------------------------------------------------------------------------------------------------------------------------------
|
516 |
+
|
517 |
+
Hopefully these samplers / controls will be LLAMACPP and available to all users via AI/LLM apps soon.
|
518 |
|
519 |
+
For more info on what they do / how they affect generation see:
|
520 |
|
521 |
https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
|
522 |
|
523 |
+
(also see the section above "Additional Links" for more info on the parameters/samplers)
|
524 |
+
|
525 |
Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
|
526 |
|
527 |
+
Most of these are also available in KOBOLDCPP too (via settings -> samplers) after start up (no "llamacpp_HF loader" step required).
|
528 |
+
|
529 |
+
I am not going to touch on all of samplers / parameters, just the main ones at the moment.
|
530 |
+
|
531 |
+
However, you should also check / test operation of:
|
532 |
+
|
533 |
+
a] Affects per token generation:
|
534 |
+
|
535 |
+
- top_a
|
536 |
+
- epsilon_cutoff
|
537 |
+
- eta_cutoff
|
538 |
+
- no_repeat_ngram_size
|
539 |
+
|
540 |
+
b] Affects generation including phrase, sentence, paragraph and entire generation:
|
541 |
+
|
542 |
+
- no_repeat_ngram_size
|
543 |
+
- encoder_repetition_penalty
|
544 |
+
- guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt.
|
545 |
+
- Disabling (BOS TOKEN) this can make the replies more creative.
|
546 |
+
- Custom stopping strings
|
547 |
+
|
548 |
+
Note: "no_repeat_ngram_size" appears in both because it can impact per token OR per phrase depending on settings.
|
549 |
+
|
550 |
+
|
551 |
+
<B>MAIN ADVANCED SAMPLERS (affects per token AND overall generation): </B>
|
552 |
+
|
553 |
+
What I will touch on here are special settings for CLASS 3 and CLASS 4 models (for the first TWO samplers).
|
554 |
|
555 |
For CLASS 3 you can use one, two or both.
|
556 |
|
|
|
567 |
|
568 |
<B>DRY:</B>
|
569 |
|
570 |
+
Dry affects repetition (and repeat "penalty") at the word, phrase, sentence and even paragraph level. Read about "DRY" above, in the "Additional Links" links section above.
|
571 |
+
|
572 |
Class 3:
|
573 |
|
574 |
dry_multiplier: .8
|
|
|
588 |
|
589 |
<B>QUADRATIC SAMPLING:</B>
|
590 |
|
591 |
+
This sampler alters the "score" of ALL TOKENS at the time of generation. See "Additional Links" links section above for more information.
|
592 |
|
593 |
Class 3:
|
594 |
|
|
|
603 |
smoothing_curve: 1.5 to 2.
|
604 |
|
605 |
|
606 |
+
<B>ANTI-SLOP - Kolbaldcpp only</B>
|
607 |
+
|
608 |
+
Hopefully this powerful sampler will soon appear in all LLM/AI apps.
|
609 |
+
|
610 |
+
You can access this in the KoboldCPP app, under "context" -> "tokens" on the main page of the app after start up.
|
611 |
+
|
612 |
+
This sampler allows banning words and phrases DURING generation, forcing the model to "make another choice".
|
613 |
+
|
614 |
+
This is a game changer in custom real time control of the model.
|
615 |
+
|
616 |
+
|
617 |
IMPORTANT:
|
618 |
|
619 |
Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
|
620 |
for operation of CLASS 4 models for chat / role play and/or "smoother operation".
|
621 |
|
622 |
+
For Class 3 models, "QUADRATIC" will have a slightly stronger effect than "DRY" relatively speaking.
|
623 |
|
624 |
+
If you use Microstat sampler, keep in mind this will interact with these two advanced samplers too.
|
625 |
|
626 |
Finally:
|
627 |
|