DavidAU
/

Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

@@ -39,7 +39,7 @@ This settings can also fix a number of model issues (any model) such as:
 Likewise ALL the setting below can also improve model generation and/or general overall "smoothness" / "quality" of model operation.
-Even if you are not using my models, you may find this document useful for any model available online.
 If you are currently using model(s) that are difficult to "wrangle" then apply "Class 3" or "Class 4" settings to them.
@@ -47,6 +47,18 @@ This document will be updated over time too.
 Please use the "community tab" for suggestions / edits / improvements.
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 PARAMETERS AND SAMPLERS
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -162,6 +174,8 @@ Keep in mind the biggest parameter / random "unknown" is your prompt.
 A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the output, even at min temp settings. CAPS also affect generation too.
 <B>temp  /  temperature</B>
 temperature (default: 0.8)
@@ -180,6 +194,8 @@ top-p sampling (default: 0.9, 1.0 = disabled)
 If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
 I use default of: .95 ;
 <B>min-p</B>
@@ -190,6 +206,8 @@ Tokens with probability smaller than (min_p) * (probability of the most likely t
 I use default: .05 ;
 <B>top-k</B>
 top-k sampling (default: 40, 0 = disabled)
@@ -198,32 +216,40 @@ Similar to top_p, but select instead only the top_k most likely tokens. Higher v
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
-NOTES:
 For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time.
-Then adjust a word, phrase, sentence etc - to see the differences.
 Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
-Then test "at temp" to see the model in action. (5-10 generations recommended)
-You can also use "temp=0" to test different quants of the same model to see generation differences. (roughly "BIAS").
-Another option is testing different models (of the same quant) to see how each handles your prompt(s).
-Then test "at temp" to see the MODELS in action. (5-10 generations recommended)
 ------------------------------------------------------------------------------
 PENALITY SAMPLERS:
 ------------------------------------------------------------------------------
-These samplers "trim" or "prune" output in real time. The longer the generation, the stronger overall effect.
 CLASS 4: For these models it is important to activate / set all samplers as noted for maximum quality and control.
-PRIMARY:
 <B>repeat-last-n</B>
@@ -240,7 +266,7 @@ This setting also works in conjunction with all other "rep pens" below.
 This parameter is the "RANGE" of tokens looked at for the samplers directly below.
-SECONDARIES:
 <B>repeat-penalty</B>

 Likewise ALL the setting below can also improve model generation and/or general overall "smoothness" / "quality" of model operation.
+Even if you are not using my models, you may find this document useful for any model (any quant / full source) available online.
 If you are currently using model(s) that are difficult to "wrangle" then apply "Class 3" or "Class 4" settings to them.
 Please use the "community tab" for suggestions / edits / improvements.
+IMPORTANT:
+Every parameter, sampler and advanced sampler here affects per token generation and overall generation quality.
+This effect is cumulative especially with long output generation and/or multi-turn (chat, role play, COT).
+Likewise because of how modern AIs/LLMs operate the previously generated (quality) of the tokens generated
+affect the next tokens generated too.
+You will get higher quality operation overall - stronger prose, better answers, and a higher quality adventure.
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 PARAMETERS AND SAMPLERS
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the output, even at min temp settings. CAPS also affect generation too.
+Likewise the size, and complexity of your prompt impacts generation too ; especially clarity and direction.
 <B>temp  /  temperature</B>
 temperature (default: 0.8)
 If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
+Dropping this can simplify word choices but this works in conjunction with "top-k"
 I use default of: .95 ;
 <B>min-p</B>
 I use default: .05 ;
+Careful adjustment of this parameter can result in more "wordy" or "less wordy" generation but this works in conjunction with "top-k".
 <B>top-k</B>
 top-k sampling (default: 40, 0 = disabled)
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
+As this parameter operates in conjection with "top-p" and "min-p" all three should be carefully adjusted one at a time.
+<B>NOTE - "CORE" Testing with "TEMP":</B>
 For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time.
+Then adjust a word, phrase, sentence etc in your prompt, and generate again to see the differences.
+(you should use a "fresh" chat for each generation)
 Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
+Then test your prompt(s) "at temp" to see the model in action. (5-10 generations recommended)
+You can also use "temp=0" to test different quants of the same model to see generation differences. (roughly minor "BIAS" changes which reflect math changes due to compress/mixtures differences between quants).
+Another option is testing different models (at temp=0 AND of the same quant) to see how each handles your prompt(s).
+Then test "at temp" with your prompt(s) to see the MODELS in action. (5-10 generations recommended)
 ------------------------------------------------------------------------------
 PENALITY SAMPLERS:
 ------------------------------------------------------------------------------
+These samplers "trim" or "prune" output in real time.
+The longer the generation, the stronger overall effect but that all depends on "repeat-last-n" setting.
+For creative use cases, these samplers can alter prose generation in interesting ways.
 CLASS 4: For these models it is important to activate / set all samplers as noted for maximum quality and control.
+<B>PRIMARY:</B>
 <B>repeat-last-n</B>
 This parameter is the "RANGE" of tokens looked at for the samplers directly below.
+<B>SECONDARIES:</B>
 <B>repeat-penalty</B>