ibm-granite/granite-3.1-8b-instruct

OfficialView on Replicate →

Granite-3.1-8B-Instruct is a lightweight and open-source 8B parameter model is designed to excel in instruction following tasks such as summarization, problem-solving, text translation, reasoning, code tasks, function-calling, and more.

Capabilities

System PromptMax TokensTop-P

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`frequency_penalty`	number	Frequency penalty	`0`	—
`max_tokens`	integer	The maximum number of tokens the model should generate as output.	`512`	—
`min_tokens`	integer	The minimum number of tokens the model should generate as output.	`0`	—
`presence_penalty`	number	Presence penalty	`0`	—
`prompt`	string	Prompt	`""`	—
`stop_sequences`	string	A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.	`—`	—
`system_prompt`	string	System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.	`"You are a helpful assistant."`	—
`temperature`	number	The value used to modulate the next token probabilities.	`0.6`	—
`top_k`	integer	The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).	`50`	—
`top_p`	number	A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).	`0.9`	—

frequency_penaltynumber

Frequency penalty

Default: 0

max_tokensinteger

The maximum number of tokens the model should generate as output.

Default: 512

min_tokensinteger

The minimum number of tokens the model should generate as output.

Default: 0

presence_penaltynumber

Presence penalty

Default: 0

promptstring

Prompt

Default: ""

stop_sequencesstring

A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.

system_promptstring

System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.

Default: "You are a helpful assistant."

temperaturenumber

The value used to modulate the next token probabilities.

Default: 0.6

top_kinteger

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

Default: 50

top_pnumber

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Default: 0.9

Version: b66517f8a5c7Updated: 7/25/2026778.1K runs