ibm-granite/granite-speech-4.1-2b

Official View on Replicate →

Granite Speech 4.1 2B is a compact and efficient speech-language model, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST) for English, French, German, Spanish, Portuguese and Jap

Capabilities

Seed System Prompt Max Tokens Top-P

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`add_generation_prompt`	boolean	Add generation prompt. Passed to the chat template. Defaults to True.	`true`	—
`audio`	array	Completion API Audio input.	`—`	—
`chat_template`	string	A template to format the prompt with. If not specified, the chat template provided by the model will be used.	`—`	—
`chat_template_kwargs`	object	Additional arguments to be passed to the chat template.	`—`	—
`frequency_penalty`	number	Frequency penalty	`—`	—
`max_completion_tokens`	integer	An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.	`—`	—
`max_tokens`	integer	max_tokens is deprecated in favor of the max_completion_tokens field.	`—`	—
`messages`	array	Chat completion API messages.	`—`	—
`min_tokens`	integer	The minimum number of tokens the model should generate as output.	`0`	—
`presence_penalty`	number	Presence penalty	`—`	—
`prompt`	string	Completion API user prompt.	`—`	—
`repetition_penalty`	number	Repetition penalty	`—`	—
`seed`	integer	Random seed. Leave unspecified to randomize the seed.	`—`	—
`stop`	array	A list of sequences to stop generation at. For example, ["<end>","<stop>"] will stop generation at the first instance of "<end>" or "<stop>".	`—`	—
`stream`	boolean	Request streaming response.	`—`	—
`system_prompt`	string	Completion API system prompt. The chat template provides a good default.	`—`	—
`temperature`	number	The value used to modulate the next token probabilities.	`—`	—
`top_k`	integer	The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).	`—`	—
`top_p`	number	A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).	`—`	—

add_generation_prompt boolean

Add generation prompt. Passed to the chat template. Defaults to True.

Default: true

audio array

Completion API Audio input.

chat_template string

A template to format the prompt with. If not specified, the chat template provided by the model will be used.

chat_template_kwargs object

Additional arguments to be passed to the chat template.

frequency_penalty number

Frequency penalty

max_completion_tokens integer

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

max_tokens integer

max_tokens is deprecated in favor of the max_completion_tokens field.

messages array

Chat completion API messages.

min_tokens integer

The minimum number of tokens the model should generate as output.

Default: 0

presence_penalty number

Presence penalty

prompt string

Completion API user prompt.

repetition_penalty number

Repetition penalty

seed integer

Random seed. Leave unspecified to randomize the seed.

stop array

A list of sequences to stop generation at. For example, ["<end>","<stop>"] will stop generation at the first instance of "<end>" or "<stop>".

stream boolean

Request streaming response.

system_prompt string

Completion API system prompt. The chat template provides a good default.

temperature number

The value used to modulate the next token probabilities.

top_k integer

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

top_p number

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Version: 25d7a190df06 Updated: 6/26/2026 96 runs