Warning: I am an amateur in this kind of subject, I'm not very technical but I know enough to rp decently. Also Images are in urls. Alright so, You may be wondering, why I use only smoothing and min_p for testing? Well to start off, I use smoothing for it's dynamic characteristic. Dynamic in a sense where it would increase and balance top token probabilities at the same time lower less likely ones incrementally depending on smoothing value. The difference between smoothing and temp is that smoothing covers and considers 'all' tokens while temp focuses either mostly top tokens in low to mid temp values or increasing the majority of the tokens that are not top prob. In short, with temp it's either you go either very deterministic or very wild, and with smoothing, you can find all possibilities adjusting token probs in any degree of determinism and or creativity, you can use smoothing curve to tinker with smoothing more but as of now there's no option to visualize smoothing curve. Temp is stiff and Smoothing is flexible. To visualize this more easily, check out the example images below which I got using Artefact's llm visualizing token probs tool. Do note that the visualization of the token probs below are based on open-ended prompts, as if I were to select token probs visualization on question prompts then it would be a question of factuality not creativity. Temp: 3, Min_p: 0.135: https://cdn-uploads.huggingface.co/production/uploads/6580400298aa9fcdd244c071/O0qoPmaMfM3UXjgObqP2V.jpeg Smoothing: 0.07, Min_p: 0.075: https://cdn-uploads.huggingface.co/production/uploads/6580400298aa9fcdd244c071/HWEy-cOaQV9jC1B7W_2rh.jpeg As you can see, smoothing has a pattern of visualizing the probs in a curve-like manner from the most considered top token, to the least considered token, compared to temp's visualization where almost all low prob tokens are almost on the same level with little prob differences compared to the vast prob differences in top prob tokens. So yeah, with the flexibility of Smoothing, I used it to increase the diversity of tokens by increasing the prob of tokens in between top prob and low prob tokens, why? Well a balanced diversity of tokens would mean increasing creativity while being coherent to an extent. If one were to focus on top tokens, in which I've noticed is a trend in many of the recommended sampling parameters, that would be somewhat limiting in grasping the full capabilities of an LLM rp model. I just feel some tokens are underutilized and are segregated with the rest of the low prob tokens which would not fare well for creativity. Now to achieve this diversity, I used a low value of smoothing, because smoothing values are quite sensitive. If smoothing has a value of 0.1 or more, token probs get more deterministic drastically, the opposite goes for smoothing value below 0.05. Now, for the min_p part, it's for quality control. With me using low values of smoothing, probs of top tokens are lessened and probs of low prob tokens are increased, thus I needed to enhance coherence further and cut off nonsensical tokens. Though I used min_p as minimally as possible so I can retain many tokens for diversification. The high values of min_p is so to keep up with smoothing's seemingly high temp side effects. So, min_p is quite proportionate to temp values. So if I wanted more creativity, I would use smoothing value of 0.06-0.07, for more determinism 0.08-0.09. Overall, I combined the dynamic creativeness of smoothing with min_p's coherence enhancer which is good enough to test the full capabilities of any rp LLM model. Sampling parameters I considered but did not make the cut to use for testing: Top_p, Typical_P, Top_K - They specialize in cutting off tokens in a manner where top tokens are the only ones considered. If tokens are a bonzai plant, min_p are shears while they're axes, unweildy to use for a small plant. Tail Free Sampling - I've heard anecdoctally that this is somewhat similiar to min_p, well with artefact's llm sampling tool it might be true, though min_p is preferrable as it's simpler to understand and more objectively measurable than Tail Free Sampling Top_A - Min_P is more exact in coverage than this. Repetition penalty parameters - I prefer to use sampling parameters that are universal in usage, if I were to use this effectively I'd have to be specific, very very specific for each model for their parameter numbers (7b, 13b, 20b) and repetition penalties are like bombs, affecting any tokens in their range, both the good and bad. Besides I can just increase creativity to offset repetition altering either smoothing or min_p. Also argued by Kalomaze to be unreliable for most models Mirostat - Argued by kalomaze in a reddit post to be unreliable somewhat. There was an anecdotal claim that this is basically just Top_K = 1000000. Also I don't read scientific math. Temp, Dynamic Temp - Good ol reliable, but stiff compared to smoothing. No repeat Ngram Size - Uh... apparently used for repetitive phrases but same reasons as Repetition penalties. Beam Search parameters - Dunno what it is, too technical. Contrastive Search - Requires sampling in general to be disabled so no. Supplementary Links: https://artefact2.github.io/llm-sampling/index.xhtml https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/ https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e https://gist.github.com/kalomaze/4d74e81c3d19ce45f73fa92df8c9b979 https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/comment/k9c1u2h/