sakemin/musicgen-stereo-chord

Generate music in stereo, restricted to chord sequences and tempo

Capabilities

SeedTop-P

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`audio_chords`	string(uri)	An audio file that will condition the chord progression. You must choose only one among `audio_chords` or `text_chords` above.	`—`	—
`audio_end`	integer	End time of the audio file to use for chord conditioning. If None, will default to the end of the audio clip.	`—`	min: 0
`audio_start`	integer	Start time of the audio file to use for chord conditioning.	`0`	min: 0
`bpm`	number	BPM condition for the generated output. `text_chords` will be processed based on this value. This will be appended at the end of `prompt`.	`—`	—
`chroma_coefficient`	number	Coefficient value multiplied to multi-hot chord chroma.	`1`	min: 0.5, max: 2.5
`classifier_free_guidance`	integer	Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.	`3`	—
`continuation`	boolean	If `True`, generated music will continue from `audio_chords`. If chord conditioning, this is only possible when the chord condition is given with `text_chords`. If `False`, generated music will mimic `audio_chords`'s chord.	`false`	—
`duration`	integer	Duration of the generated audio in seconds.	`8`	—
`model_version`	string	Model type. Select `fine-tuned` if you trained the model into your own repository.	`"stereo-chord-large"`	chordchord-largestereo-chordstereo-chord-large
`multi_band_diffusion`	boolean	If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion. Not compatible with stereo models.	`false`	—
`normalization_strategy`	string	Strategy for normalizing audio.	`"loudness"`	loudnessclippeakrms
`output_format`	string	Output format for generated audio.	`"wav"`	wavmp3
`prompt`	string	A description of the music you want to generate.	`—`	—
`seed`	integer	Seed for random number generator. If `None` or `-1`, a random seed will be used.	`—`	—
`temperature`	number	Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.	`1`	—
`text_chords`	string	A text based chord progression condition. Single uppercase alphabet character(eg. `C`) is considered as a major chord. Chord attributes like(`maj`, `min`, `dim`, `aug`, `min6`, `maj6`, `min7`, `minmaj7`, `maj7`, `7`, `dim7`, `hdim7`, `sus2` and `sus4`) can be added to the root alphabet character after `:`.(eg. `A:min7`) Each chord token splitted by `SPACE` is allocated to a single bar. If more than one chord must be allocated to a single bar, cluster the chords adding with `,` without any `SPACE`.(eg. `C,C:7 G, E:min A:min`) You must choose either only one of `audio_chords` below or `text_chords`.	`—`	—
`time_sig`	string	Time signature value for the generate output. `text_chords` will be processed based on this value. This will be appended at the end of `prompt`.	`"4/4"`	—
`top_k`	integer	Reduces sampling to the k most likely tokens.	`250`	—
`top_p`	number	Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.	`0`	—

audio_chordsstring

An audio file that will condition the chord progression. You must choose only one among `audio_chords` or `text_chords` above.

audio_endinteger

End time of the audio file to use for chord conditioning. If None, will default to the end of the audio clip.

min: 0

audio_startinteger

Start time of the audio file to use for chord conditioning.

Default: 0min: 0

bpmnumber

BPM condition for the generated output. `text_chords` will be processed based on this value. This will be appended at the end of `prompt`.

chroma_coefficientnumber

Coefficient value multiplied to multi-hot chord chroma.

Default: 1min: 0.5, max: 2.5

classifier_free_guidanceinteger

Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.

Default: 3

continuationboolean

If `True`, generated music will continue from `audio_chords`. If chord conditioning, this is only possible when the chord condition is given with `text_chords`. If `False`, generated music will mimic `audio_chords`'s chord.

Default: false

durationinteger

Duration of the generated audio in seconds.

Default: 8

model_versionstring

Model type. Select `fine-tuned` if you trained the model into your own repository.

Default: "stereo-chord-large"

chordchord-largestereo-chordstereo-chord-large

multi_band_diffusionboolean

If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion. Not compatible with stereo models.

Default: false

normalization_strategystring

Strategy for normalizing audio.

Default: "loudness"

loudnessclippeakrms

output_formatstring

Output format for generated audio.

Default: "wav"

wavmp3

promptstring

A description of the music you want to generate.

seedinteger

Seed for random number generator. If `None` or `-1`, a random seed will be used.

temperaturenumber

Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.

Default: 1

text_chordsstring

A text based chord progression condition. Single uppercase alphabet character(eg. `C`) is considered as a major chord. Chord attributes like(`maj`, `min`, `dim`, `aug`, `min6`, `maj6`, `min7`, `minmaj7`, `maj7`, `7`, `dim7`, `hdim7`, `sus2` and `sus4`) can be added to the root alphabet character after `:`.(eg. `A:min7`) Each chord token splitted by `SPACE` is allocated to a single bar. If more than one chord must be allocated to a single bar, cluster the chords adding with `,` without any `SPACE`.(eg. `C,C:7 G, E:min A:min`) You must choose either only one of `audio_chords` below or `text_chords`.

time_sigstring

Time signature value for the generate output. `text_chords` will be processed based on this value. This will be appended at the end of `prompt`.

Default: "4/4"

top_kinteger

Reduces sampling to the k most likely tokens.

Default: 250

top_pnumber

Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.

Default: 0

Version: fbdc5ef72002Updated: 7/25/20263.4K runs