Llama 2-Chat is a fine-tuned large language model(LLM) with optimized for dialogue use cases.

llama2-7b-chat-vllm

The maximum number of tokens to generate. Shorter token lengths will provide faster performance.

A decimal number that determines the degree of randomness in the response

An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass.

The top_k parameter is used to limit the number of choices for the next predicted word or token.