cuuupid/qwen2-vl-2b

SOTA open-source model for chatting with videos and the newest model in the Qwen family

Capabilities

1:14:33:416:99:162:33:2Max Tokens

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`video`*	string(uri)	Video to process	`—`	—
`height`	integer	Height for the video	`128`	min: 128, max: 2048
`max_duration`	number	Maximum duration of the video in seconds (above 360, may run out of VRAM).	`60`	min: 1, max: 768
`max_tokens`	integer	Maximum number of tokens to generate	`128`	min: 1, max: 8192
`prompt`	string	Prompt to use for the video	`"Describe the video."`	—
`repetition_penalty`	number	Repetition penalty for the model (1.1 is a good default).	`1.1`	min: 0.01, max: 1.5
`temperature`	number	Temperature for the model (0.7 is a good default).	`0.7`	min: 0.01, max: 1
`width`	integer	Width for the video	`128`	min: 128, max: 2048

videorequiredstring

Video to process

heightinteger

Height for the video

Default: 128min: 128, max: 2048

max_durationnumber

Maximum duration of the video in seconds (above 360, may run out of VRAM).

Default: 60min: 1, max: 768

max_tokensinteger

Maximum number of tokens to generate

Default: 128min: 1, max: 8192

promptstring

Prompt to use for the video

Default: "Describe the video."

repetition_penaltynumber

Repetition penalty for the model (1.1 is a good default).

Default: 1.1min: 0.01, max: 1.5

temperaturenumber

Temperature for the model (0.7 is a good default).

Default: 0.7min: 0.01, max: 1

widthinteger

Width for the video

Default: 128min: 128, max: 2048

Version: b3e77005f199Updated: 7/25/2026659 runs