meta/meta-llama-3.1-405b-instruct
Meta's flagship 405 billion parameter language model, fine-tuned for chat completions
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
frequency_penalty | number | Frequency penalty | 0 | — |
max_tokens | integer | The maximum number of tokens the model should generate as output. | 512 | — |
min_tokens | integer | The minimum number of tokens the model should generate as output. | 0 | — |
presence_penalty | number | Presence penalty | 0 | — |
prompt | string | Prompt | "" | — |
prompt_template | string | A template to format the prompt with. If not provided, the default prompt template will be used. | "" | — |
stop_sequences | string | A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'. | "" | — |
system_prompt | string | System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models. | "You are a helpful assistant." | — |
temperature | number | The value used to modulate the next token probabilities. | 0.6 | — |
top_k | integer | The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering). | 50 | — |
top_p | number | A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751). | 0.9 | — |
frequency_penalty number Frequency penalty
0 max_tokens integer The maximum number of tokens the model should generate as output.
512 min_tokens integer The minimum number of tokens the model should generate as output.
0 presence_penalty number Presence penalty
0 prompt string Prompt
"" prompt_template string A template to format the prompt with. If not provided, the default prompt template will be used.
"" stop_sequences string A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
"" system_prompt string System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
"You are a helpful assistant." temperature number The value used to modulate the next token probabilities.
0.6 top_k integer The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
50 top_p number A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
0.9 4ff591d23f09 Updated: 2/26/2026 7.1M runs
cinemasetfree.com