DavidAU
/

Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

@@ -38,7 +38,11 @@ The settings discussed in this document can also fix a number of model issues (<
 - General output quality.
 - Role play related issues.
-Likewise ALL the setting below can also improve model generation and/or general overall "smoothness" / "quality" of model operation.
 Even if you are not using my models, you may find this document useful for any model (any quant / full source) available online.
@@ -54,8 +58,7 @@ Every parameter, sampler and advanced sampler here affects per token generation
 This effect is cumulative especially with long output generation and/or multi-turn (chat, role play, COT).
-Likewise because of how modern AIs/LLMs operate the previously generated (quality) of the tokens generated
-affect the next tokens generated too.
 You will get higher quality operation overall - stronger prose, better answers, and a higher quality adventure.
@@ -90,32 +93,45 @@ These parameters/settings are considered both safe and default and in most cases
 ---
-<B>Llama CPP Parameters, Samplers and Advanced Samplers</B>
-Below are all the LLAMA_CPP parameters and samplers.
-I have added notes below each one for adjustment / enhancement(s) for specific use cases.
-Following this section will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui .
-The "llamacpp_HF" only requires the GGUF you want to use plus a few config files from "source repo" of the model.
-(this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
-Source files / Source models of my models are located here (also upper right menu on this page):
-[ https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be ]
-This allows access to very advanced samplers in addition to all the parameters / samplers here.
-For additional details on these samplers settings (including advanced ones) you may also want to check out:
-https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
-(NOTE: Not all of these "options" are available for GGUFS, including when you use "llamacpp_HF" loader)
 Note that https://github.com/LostRuins/koboldcpp also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
 Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
 In most cases all llama_cpp settings are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama" and "lmstudio" (as well as other apps too).
@@ -126,6 +142,24 @@ https://github.com/ggerganov/llama.cpp
 (scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
 ---
 CRITICAL NOTES:
@@ -168,9 +202,53 @@ Imatrix quants generally improve all quants, and also allow you to use smaller q
 IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
 ------------------------------------------------------------------------------
-PRIMARY PARAMETERS:
 ------------------------------------------------------------------------------
 These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
@@ -221,7 +299,7 @@ Similar to top_p, but select instead only the top_k most likely tokens. Higher v
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
-As this parameter operates in conjection with "top-p" and "min-p" all three should be carefully adjusted one at a time.
 <B>NOTE - "CORE" Testing with "TEMP":</B>
@@ -243,7 +321,7 @@ Then test "at temp" with your prompt(s) to see the MODELS in action. (5-10 gener
 ------------------------------------------------------------------------------
-PENALITY SAMPLERS:
 ------------------------------------------------------------------------------
 These samplers "trim" or "prune" output in real time.
@@ -252,6 +330,8 @@ The longer the generation, the stronger overall effect but that all depends on "
 For creative use cases, these samplers can alter prose generation in interesting ways.
 CLASS 4: For these models it is important to activate / set all samplers as noted for maximum quality and control.
 <B>PRIMARY:</B>
@@ -313,44 +393,32 @@ Generally this is not used.
 ------------------------------------------------------------------------------
-SECONDARY SAMPLERS / FILTERS:
 ------------------------------------------------------------------------------
-<B>tfs</B>
-tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
-Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
-( https://www.trentonbricken.com/Tail-Free-Sampling/ )
-<B>typical</B>
-locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
-If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
 <B>mirostat</B>
-use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
-                                        		(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
 <B>mirostat-lr</B>
 Mirostat learning rate, parameter eta (default: 0.1)  " mirostat_tau "
 <B>mirostat-ent</B>
 Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
-Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. (https://arxiv.org/abs/2007.14966)
-mirostat_tau: 5-8 is a good value.
 mirostat_eta: 0.1 is a good value.
 This is the big one ; activating this will help with creative generation. It can also help with stability. Also note which
 samplers are disabled/ignored here, and that "mirostat_eta" is a learning rate.
@@ -360,9 +428,9 @@ This is both a sampler (and pruner) and enhancement all in one.
 It also has two modes of generation "1" and "2" - test both with 5-10 generations of the same prompt. Make adjustments, and repeat.
-For Class 3 models it is suggested to use this to assist with generation (min settings).
-For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat_tau @ 6 to 8 and mirostat_eta at .1 to .5
 <B>dynatemp-range</B>
@@ -379,11 +447,11 @@ Activates Dynamic Temperature. This modifies temperature to range between "dynat
 This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
-For Kobold a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
 Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
-To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
 1 - Set the "temp" to 1.3 (the regular temp parameter)
@@ -394,6 +462,21 @@ To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit
 This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
 <B>xtc-probability</B>
 xtc probability (default: 0.0, 0.0 = disabled)
@@ -409,8 +492,6 @@ If 2 or more tokens have probability above this threshold, consider removing all
 XTC is a new sampler, that adds an interesting twist in generation.
 Suggest you experiment with this one, with other advanced samplers disabled to see its affects.
 <B>l,    logit-bias TOKEN_ID(+/-)BIAS   </B>
 modifies the likelihood of token appearing in the completion,
@@ -427,17 +508,49 @@ I suggest you get some "bad outputs" ; get the "tokens" (actual number for the "
 Careful testing is required, as this can have unclear side effects.
-------------------------------------------------------------------------------
-ADVANCED SAMPLERS:
-------------------------------------------------------------------------------
-I am not going to touch on all of them ; just the main ones ; for more info see:
 https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
 Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
-What I will touch on here are special settings for CLASS 3 and CLASS 4 models.
 For CLASS 3 you can use one, two or both.
@@ -454,6 +567,8 @@ You may therefore want to experiment to with dropping the settings (SLOWLY) for
 <B>DRY:</B>
 Class 3:
 dry_multiplier: .8
@@ -473,6 +588,7 @@ dry_base: 1.15 to 1.5
 <B>QUADRATIC SAMPLING:</B>
 Class 3:
@@ -487,14 +603,25 @@ smoothing_factor: 3 to 5 (or higher)
 smoothing_curve: 1.5 to 2.
 IMPORTANT:
 Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
 for operation of CLASS 4 models for chat / role play and/or "smoother operation".
-For Class 3 models, "QUADRATIC" will have a stronger effect than "DRY" relatively speaking.
-If you use Microstat, keep in mind this will interact with these two advanced samplers too.
 Finally:

 - General output quality.
 - Role play related issues.
+Likewise ALL the setting (parameters, samplers and advanced samplers) below can also improve model generation and/or general overall "smoothness" / "quality" of model operation:
+- all parameters and samplers available via LLAMACPP (and most apps that run / use LLAMACPP)
+- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in oobabooga/text-generation-webui including llamacpp_HF loader (allowing a lot more samplers)
+- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in KoboldCPP (including Anti-slop filters)
 Even if you are not using my models, you may find this document useful for any model (any quant / full source) available online.
 This effect is cumulative especially with long output generation and/or multi-turn (chat, role play, COT).
+Likewise because of how modern AIs/LLMs operate the previously generated (quality) of the tokens generated affect the next tokens generated too.
 You will get higher quality operation overall - stronger prose, better answers, and a higher quality adventure.
 ---
+<B>SOURCE FILES for my Models:</B>
+Source files / Source models of my models are located here (also upper right menu on this page):
+[ https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be ]
+You will need the config files to use "llamacpp_HF" loader ("text-generation-webui") [ https://github.com/oobabooga/text-generation-webui ]
+You can also use the full source in "text-generation-webui" too.
+As an alternative you can use GGUFs directly in "KOBOLDCPP" without the "config files" and still use almost all the parameters, samplers and advanced samplers.
+<B>Parameters, Samplers and Advanced Samplers</B>
+In section 1 a,b, and c, below are all the LLAMA_CPP parameters and samplers.
+I have added notes below each one for adjustment / enhancement(s) for specific use cases.
+TEXT-GENERATION-WEBUI
+In section 2, will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui
+AND/OR https://github.com/LostRuins/koboldcpp ("KOBOLDCPP").
+The "llamacpp_HF" (for "text-generation-webui") only requires the GGUF you want to use plus a few config files from "source repo" of the model.
+(this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
+This allows access to very advanced samplers in addition to all the parameters / samplers here.
+KOBOLDCPP:
 Note that https://github.com/LostRuins/koboldcpp also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
+You can use almost all parameters, samplers and advanced samplers using "KOBOLDCPP" without the need to get the source config files (the "llamacpp_HF" step).
+Note: This program has one of the newest samplers called "Anti-slop" which allows phrase/word banning at the generation level.
+OTHER PROGRAMS:
 Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
 In most cases all llama_cpp settings are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama" and "lmstudio" (as well as other apps too).
 (scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
+DETAILED NOTES ON PARAMETERS, SAMPLERS and ADVANCED SAMPLERS:
+For additional details on these samplers settings (including advanced ones) you may also want to check out:
+https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
+(NOTE: Not all of these "options" are available for GGUFS, including when you use "llamacpp_HF" loader in "text-generation-webui" )
+Additional Links:
+	=> DRY => https://github.com/oobabooga/text-generation-webui/pull/5677
+	=> DRY => https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
+	=> DRY => https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
+	=> Samplers (videos) : https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
+	=> Creative Writing -> https://www.reddit.com/r/LocalLLaMA/comments/1c36ieb/comparing_sampling_techniques_for_creative/
+	=> Parameters => https://arxiv.org/html/2408.13586v1
+	=> Stats on some parameters => https://github.com/ZhouYuxuanYX/Benchmarking-and-Guiding-Adaptive-Sampling-Decoding-for-LLMs
 ---
 CRITICAL NOTES:
 IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
+---
+HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)
+1 - Set temp to 0 (zero) and set your basic parameters, and use a prompt to get a "default" generation. A creative prompt will work better here.
+2 - If you want to test basic parameter changes, test ONE at a time, then compare output (answer quality, word choice, sentence size/construction, general output qualities) to your "default" generation.
+3 - Then start testing TWO parameters at a time, and comparing again. Keep in mind parameters (all) interact with each other.
+4 - Samplers -> Reset your basic parameters, (temp still at zero) and test each one of these, one at a time. Then adjust settings, test again.
+5 - Once you have an "idea" of how each affects your "test prompt" , now test at "temp" (not zero). It may take five to ten generation to get a rough idea.
+Yes, testing is a lot of work - but once you get all the parameter(s) and/or sampler(s) dialed in - it is worth it.
+IMPORTANT: Use a "fresh chat" PER TEST (you will contaminate the results otherwise). Never use the same chat for multiple tests -> exception: Regens.
+Keep in mind that parameters, samplers and advanced samplers can affect the model on a per token generation basis AND/OR on a multi-token / phrase / sentence / paragraph
+and even complete generation basis.
+Everything is cumulative here regardless if the parameter/sampler affects per token or multi-token basis because of how models "look back" to see what was generated in some cases.
+And of course... each model will be different too.
+All that being said, it is a good idea to have specific generation quality "goals" in mind.
+Likewise, at my repo, I post example generations so you can get an idea (but not complete picture) of a model's generation abilities.
+The best way to control generation is STILL with your prompt(s) - including pre-prompts/system role. The latest gen models (and archs) have very strong
+instruction following so many times better (or just included!) instructions in your prompts can make a world of difference.
+Not sure if the model understands your prompt(s)?
+Ask it ->
+"Check my prompt below and tell me how to make it clearer?" (prompt after this line)
+"For my prompt below, explain the steps you wound take to execute it" (prompt after this line)
+This will help the model fine tune your prompt so IT understands it.
+However sometimes parameters and/or samplers are required to better "wrangle" the model and getting to perform to its maximum potential and/or fine tune it to your use case(s).
 ------------------------------------------------------------------------------
+Section 1a : PRIMARY PARAMETERS - ALL APPS:
 ------------------------------------------------------------------------------
 These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
+As this parameter operates in conjunction with "top-p" and "min-p" all three should be carefully adjusted one at a time.
 <B>NOTE - "CORE" Testing with "TEMP":</B>
 ------------------------------------------------------------------------------
+Section 1b : PENALITY SAMPLERS - ALL APPS:
 ------------------------------------------------------------------------------
 These samplers "trim" or "prune" output in real time.
 For creative use cases, these samplers can alter prose generation in interesting ways.
+Penalty parameters affect both per token and part of OR entire generation (depending on settings / output length).
 CLASS 4: For these models it is important to activate / set all samplers as noted for maximum quality and control.
 <B>PRIMARY:</B>
 ------------------------------------------------------------------------------
+Section 1c : SECONDARY SAMPLERS / FILTERS - ALL APPS:
 ------------------------------------------------------------------------------
+In some AI/LLM apps, these may only be available via JSON file modification and/or API.
+For "text-gen-webui" and "Koboldcpp" these are directly accessible.
+i) OVERALL GENERATION CHANGES (affect per token as well as over all generation):
 <B>mirostat</B>
+Use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.  (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
 <B>mirostat-lr</B>
 Mirostat learning rate, parameter eta (default: 0.1)  " mirostat_tau "
+mirostat_tau: 5-8 is a good value.
 <B>mirostat-ent</B>
 Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
 mirostat_eta: 0.1 is a good value.
+Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. ( https://arxiv.org/abs/2007.14966 )
 This is the big one ; activating this will help with creative generation. It can also help with stability. Also note which
 samplers are disabled/ignored here, and that "mirostat_eta" is a learning rate.
 It also has two modes of generation "1" and "2" - test both with 5-10 generations of the same prompt. Make adjustments, and repeat.
+CLASS 3: models it is suggested to use this to assist with generation (min settings).
+CLASS 4: models it is highly recommended with Microstat 1 or 2 + mirostat_tau @ 6 to 8 and mirostat_eta at .1 to .5
 <B>dynatemp-range</B>
 This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
+For Koboldcpp a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
 Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
+To set manually (IE: Api, lmstudio, Llamacpp, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
 1 - Set the "temp" to 1.3 (the regular temp parameter)
 This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
+ii) PER TOKEN CHANGES:
+<B>tfs</B>
+Tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
+Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
+( https://www.trentonbricken.com/Tail-Free-Sampling/ )
+<B>typical</B>
+Locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
+If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
 <B>xtc-probability</B>
 xtc probability (default: 0.0, 0.0 = disabled)
 XTC is a new sampler, that adds an interesting twist in generation.
 Suggest you experiment with this one, with other advanced samplers disabled to see its affects.
 <B>l,    logit-bias TOKEN_ID(+/-)BIAS   </B>
 modifies the likelihood of token appearing in the completion,
 Careful testing is required, as this can have unclear side effects.
+------------------------------------------------------------------------------------------------------------------------------------------------------------
+SECTION 2: ADVANCED SAMPLERS - "text-generation-webui" / "KOBOLDCPP":
+Additional Parameters / Samplers, including "DRY", "QUADRATIC" and "ANTI-SLOP".
+------------------------------------------------------------------------------------------------------------------------------------------------------------
+Hopefully these samplers / controls will be LLAMACPP and available to all users via AI/LLM apps soon.
+For more info on what they do / how they affect generation see:
 https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
+(also see the section above "Additional Links" for more info on the parameters/samplers)
 Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
+Most of these are also available in KOBOLDCPP too (via settings -> samplers) after start up (no "llamacpp_HF loader" step required).
+I am not going to touch on all of samplers / parameters, just the main ones at the moment.
+However, you should also check / test operation of:
+a] Affects per token generation:
+- top_a
+- epsilon_cutoff
+- eta_cutoff
+- no_repeat_ngram_size
+b] Affects generation including phrase, sentence, paragraph and entire generation:
+- no_repeat_ngram_size
+- encoder_repetition_penalty
+- guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt.
+- Disabling (BOS TOKEN) this can make the replies more creative.
+- Custom stopping strings
+Note: "no_repeat_ngram_size" appears in both because it can impact per token OR per phrase depending on settings.
+<B>MAIN ADVANCED SAMPLERS (affects per token AND overall generation): </B>
+What I will touch on here are special settings for CLASS 3 and CLASS 4 models (for the first TWO samplers).
 For CLASS 3 you can use one, two or both.
 <B>DRY:</B>
+Dry affects repetition (and repeat "penalty") at the word, phrase, sentence and even paragraph level. Read about "DRY" above, in the "Additional Links" links section above.
 Class 3:
 dry_multiplier: .8
 <B>QUADRATIC SAMPLING:</B>
+This sampler alters the "score" of ALL TOKENS at the time of generation. See "Additional Links" links section above for more information.
 Class 3:
 smoothing_curve: 1.5 to 2.
+<B>ANTI-SLOP - Kolbaldcpp only</B>
+Hopefully this powerful sampler will soon appear in all LLM/AI apps.
+You can access this in the KoboldCPP app, under "context" -> "tokens" on the main page of the app after start up.
+This sampler allows banning words and phrases DURING generation, forcing the model to "make another choice".
+This is a game changer in custom real time control of the model.
 IMPORTANT:
 Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
 for operation of CLASS 4 models for chat / role play and/or "smoother operation".
+For Class 3 models, "QUADRATIC" will have a slightly stronger effect than "DRY" relatively speaking.
+If you use Microstat sampler, keep in mind this will interact with these two advanced samplers too.
 Finally: