← Back to all generators

ibm-granite/granite-4.0-h-small

Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Capabilities

Seed System Prompt Max Tokens Top-P

Cost

Community model (estimated from hardware time)

Input Parameters

add_generation_prompt boolean

Add generation prompt. Passed to the chat template. Defaults to True.

Default: true
chat_template string

A template to format the prompt with. If not specified, the chat template provided by the model will be used.

chat_template_kwargs object

Additional arguments to be passed to the chat template.

Default: [object Object]
documents array

Documents for request. Passed to the chat template.

Default:
frequency_penalty number

Frequency penalty

max_completion_tokens integer

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

max_tokens integer

max_tokens is deprecated in favor of the max_completion_tokens field.

messages array

Chat completion API messages.

Default:
min_tokens integer

The minimum number of tokens the model should generate as output.

Default: 0
presence_penalty number

Presence penalty

prompt string

Completion API user prompt.

repetition_penalty number

Repetition penalty

response_format object

An object specifying the format that the model must output.

seed integer

Random seed. Leave unspecified to randomize the seed.

stop array

A list of sequences to stop generation at. For example, ["<end>","<stop>"] will stop generation at the first instance of "<end>" or "<stop>".

Default:
stream boolean

Request streaming response. Defaults to False.

Default: false
system_prompt string

Completion API system prompt. The chat template provides a good default.

temperature number

The value used to modulate the next token probabilities.

Default: 0
tool_choice string

Tool choice for request. If the choice is a specific function, this should be specified as a JSON string.

tools array

Tools for request. Passed to the chat template.

Default:
top_k integer

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

Default: 50
top_p number

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Default: 0.9
Version: aaa80dbee13a Updated: 6/26/2026 228.6K runs