meta/sam-2-video

SAM 2: Segment Anything v2 (for videos)

Capabilities

No capability data available

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`click_coordinates`*	string	Click coordinates as '[x,y],[x,y],...'. Determines number of clicks.	`—`	—
`input_video`*	string(uri)	Input video file path	`—`	—
`annotation_type`	string	Annotation type: mask only, bounding box only, or both (ignored for binary and greenscreen)	`"mask"`	maskboxboth
`click_frames`	string	Frame indices for clicks as '0,0,150,0'. Auto-extends if shorter than coordinates.	`"0"`	—
`click_labels`	string	Click types (1=foreground, 0=background) as '1,1,0,1'. Auto-extends if shorter than coordinates.	`"1"`	—
`click_object_ids`	string	Object labels for clicks as 'person,dog,cat'. Auto-generates if missing or incomplete.	`""`	—
`mask_type`	string	Mask type: binary (B&W), highlighted (colored overlay), or greenscreen	`"binary"`	binaryhighlightedgreenscreen
`output_format`	string	Image format for sequence (ignored for video)	`"webp"`	webpjpgpng
`output_frame_interval`	integer	Output every Nth frame. 1=all frames, 2=every other, etc.	`1`	—
`output_quality`	integer	JPG/WebP compression quality (0-100, ignored for PNG and video)	`80`	min: 0, max: 100
`output_video`	boolean	True for video output, False for image sequence	`false`	—
`video_fps`	integer	Video output frame rate (ignored for image sequence)	`30`	min: 1, max: 60

click_coordinatesrequiredstring

Click coordinates as '[x,y],[x,y],...'. Determines number of clicks.

input_videorequiredstring

Input video file path

annotation_typestring

Annotation type: mask only, bounding box only, or both (ignored for binary and greenscreen)

Default: "mask"

maskboxboth

click_framesstring

Frame indices for clicks as '0,0,150,0'. Auto-extends if shorter than coordinates.

Default: "0"

click_labelsstring

Click types (1=foreground, 0=background) as '1,1,0,1'. Auto-extends if shorter than coordinates.

Default: "1"

click_object_idsstring

Object labels for clicks as 'person,dog,cat'. Auto-generates if missing or incomplete.

Default: ""

mask_typestring

Mask type: binary (B&W), highlighted (colored overlay), or greenscreen

Default: "binary"

binaryhighlightedgreenscreen

output_formatstring

Image format for sequence (ignored for video)

Default: "webp"

webpjpgpng

output_frame_intervalinteger

Output every Nth frame. 1=all frames, 2=every other, etc.

Default: 1

output_qualityinteger

JPG/WebP compression quality (0-100, ignored for PNG and video)

Default: 80min: 0, max: 100

output_videoboolean

True for video output, False for image sequence

Default: false

video_fpsinteger

Video output frame rate (ignored for image sequence)

Default: 30min: 1, max: 60

Version: 33432afdfc06Updated: 7/25/202657.1K runs