← Back to all generators

ibm-granite/granite-speech-4.1-2b

Granite Speech 4.1 2B is a compact and efficient speech-language model, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST) for English, French, German, Spanish, Portuguese and Jap

Capabilities

Seed System Prompt Max Tokens Top-P

Cost

Community model (estimated from hardware time)

Input Parameters

add_generation_prompt boolean

Add generation prompt. Passed to the chat template. Defaults to True.

Default: true
audio array

Completion API Audio input.

chat_template string

A template to format the prompt with. If not specified, the chat template provided by the model will be used.

chat_template_kwargs object

Additional arguments to be passed to the chat template.

frequency_penalty number

Frequency penalty

max_completion_tokens integer

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

max_tokens integer

max_tokens is deprecated in favor of the max_completion_tokens field.

messages array

Chat completion API messages.

min_tokens integer

The minimum number of tokens the model should generate as output.

Default: 0
presence_penalty number

Presence penalty

prompt string

Completion API user prompt.

repetition_penalty number

Repetition penalty

seed integer

Random seed. Leave unspecified to randomize the seed.

stop array

A list of sequences to stop generation at. For example, ["<end>","<stop>"] will stop generation at the first instance of "<end>" or "<stop>".

stream boolean

Request streaming response.

system_prompt string

Completion API system prompt. The chat template provides a good default.

temperature number

The value used to modulate the next token probabilities.

top_k integer

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

top_p number

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Version: 25d7a190df06 Updated: 6/26/2026 96 runs