lucataco/csm-1b

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs

Capabilities

No capability data available

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`max_audio_length_ms`	integer	Maximum audio length in milliseconds	`10000`	min: 1000, max: 30000
`speaker`	integer	Speaker ID (0 or 1)	`0`	01
`text`	string	Text to convert to speech	`"Hello from Sesame."`	—

max_audio_length_msinteger

Maximum audio length in milliseconds

Default: 10000min: 1000, max: 30000

speakerinteger

Speaker ID (0 or 1)

Default: 0

textstring

Text to convert to speech

Default: "Hello from Sesame."

Version: 3e59b10a9894Updated: 7/25/20261.2K runs