Top-K is a sampling setting used in Large Language Models (LLMs) to restrict the predicted next token to come from tokens with the top predicted probabilities[1]. Like temperature, Top-K controls the randomness and diversity of generated text[1].
Top-K sampling selects the top K most likely tokens from the model’s predicted distribution[1]. The higher the Top-K, the more creative and varied the model’s output; the lower the Top-K, the more restrictive and factual the model’s output[1]. A Top-K of 1 is equivalent to greedy decoding[1].
If temperature, top-K, and top-P are all available, tokens that meet both the top-K and top-P criteria are candidates for the next predicted token, and then temperature is applied to sample from the tokens that passed the top-K and top-P criteria[1]. If only top-K or top-P is available, the behavior is the same, but only the one top-K or P setting is used[1].
If temperature is not available, whatever tokens meet the top-K and/or top-P criteria are then randomly selected from to produce a single next predicted token[1].
If you set temperature to 0, top-K and top-P become irrelevant–the most probable token becomes the next token predicted[1]. If you set temperature extremely high (above 1–generally into the 10s), temperature becomes irrelevant and whatever tokens make it through the top-K and/or top-P criteria are then randomly sampled to choose a next predicted token[1].
If you set top-K to 1, temperature and top-P become irrelevant[1]. Only one token passes the top-K criteria, and that token is the next predicted token[1]. If you set top-K extremely high, like to the size of the LLM’s vocabulary, any token with a non-zero probability of being the next token will meet the top-K criteria, and none are selected out[1].
Get more accurate answers with Super Search, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.
Let's look at alternatives: