ibm-granite/granite-4.0-h-small

OfficialView on Replicate →

Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Capabilities

SeedSystem PromptMax TokensTop-P

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`add_generation_prompt`	boolean	Add generation prompt. Passed to the chat template. Defaults to True.	`true`	—
`chat_template`	string	A template to format the prompt with. If not specified, the chat template provided by the model will be used.	`—`	—
`chat_template_kwargs`	object	Additional arguments to be passed to the chat template.	`[object Object]`	—
`documents`	array	Documents for request. Passed to the chat template.		—
`frequency_penalty`	number	Frequency penalty	`—`	—
`max_completion_tokens`	integer	An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.	`—`	—
`max_tokens`	integer	max_tokens is deprecated in favor of the max_completion_tokens field.	`—`	—
`messages`	array	Chat completion API messages.		—
`min_tokens`	integer	The minimum number of tokens the model should generate as output.	`0`	—
`presence_penalty`	number	Presence penalty	`—`	—
`prompt`	string	Completion API user prompt.	`—`	—
`repetition_penalty`	number	Repetition penalty	`—`	—
`response_format`	object	An object specifying the format that the model must output.	`—`	—
`seed`	integer	Random seed. Leave unspecified to randomize the seed.	`—`	—
`stop`	array	A list of sequences to stop generation at. For example, ["<end>","<stop>"] will stop generation at the first instance of "<end>" or "<stop>".		—
`stream`	boolean	Request streaming response. Defaults to False.	`false`	—
`system_prompt`	string	Completion API system prompt. The chat template provides a good default.	`—`	—
`temperature`	number	The value used to modulate the next token probabilities.	`0`	—
`tool_choice`	string	Tool choice for request. If the choice is a specific function, this should be specified as a JSON string.	`—`	—
`tools`	array	Tools for request. Passed to the chat template.		—
`top_k`	integer	The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).	`50`	—
`top_p`	number	A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).	`0.9`	—

add_generation_promptboolean

Add generation prompt. Passed to the chat template. Defaults to True.

Default: true

chat_templatestring

A template to format the prompt with. If not specified, the chat template provided by the model will be used.

chat_template_kwargsobject

Additional arguments to be passed to the chat template.

Default: [object Object]

documentsarray

Documents for request. Passed to the chat template.

Default:

frequency_penaltynumber

Frequency penalty

max_completion_tokensinteger

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

max_tokensinteger

max_tokens is deprecated in favor of the max_completion_tokens field.

messagesarray

Chat completion API messages.

Default:

min_tokensinteger

The minimum number of tokens the model should generate as output.

Default: 0

presence_penaltynumber

Presence penalty

promptstring

Completion API user prompt.

repetition_penaltynumber

Repetition penalty

response_formatobject

An object specifying the format that the model must output.

seedinteger

Random seed. Leave unspecified to randomize the seed.

stoparray

A list of sequences to stop generation at. For example, ["<end>","<stop>"] will stop generation at the first instance of "<end>" or "<stop>".

Default:

streamboolean

Request streaming response. Defaults to False.

Default: false

system_promptstring

Completion API system prompt. The chat template provides a good default.

temperaturenumber

The value used to modulate the next token probabilities.

Default: 0

tool_choicestring

Tool choice for request. If the choice is a specific function, this should be specified as a JSON string.

toolsarray

Tools for request. Passed to the chat template.

Default:

top_kinteger

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

Default: 50

top_pnumber

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Default: 0.9

Version: aaa80dbee13aUpdated: 7/25/2026228.6K runs