meta/sam-2-video
SAM 2: Segment Anything v2 (for videos)
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
click_coordinates * | string | Click coordinates as '[x,y],[x,y],...'. Determines number of clicks. | — | — |
input_video * | string (uri) | Input video file path | — | — |
annotation_type | string | Annotation type: mask only, bounding box only, or both (ignored for binary and greenscreen) | "mask" | mask box both |
click_frames | string | Frame indices for clicks as '0,0,150,0'. Auto-extends if shorter than coordinates. | "0" | — |
click_labels | string | Click types (1=foreground, 0=background) as '1,1,0,1'. Auto-extends if shorter than coordinates. | "1" | — |
click_object_ids | string | Object labels for clicks as 'person,dog,cat'. Auto-generates if missing or incomplete. | "" | — |
mask_type | string | Mask type: binary (B&W), highlighted (colored overlay), or greenscreen | "binary" | binary highlighted greenscreen |
output_format | string | Image format for sequence (ignored for video) | "webp" | webp jpg png |
output_frame_interval | integer | Output every Nth frame. 1=all frames, 2=every other, etc. | 1 | — |
output_quality | integer | JPG/WebP compression quality (0-100, ignored for PNG and video) | 80 | min: 0, max: 100 |
output_video | boolean | True for video output, False for image sequence | false | — |
video_fps | integer | Video output frame rate (ignored for image sequence) | 30 | min: 1, max: 60 |
click_coordinates required string Click coordinates as '[x,y],[x,y],...'. Determines number of clicks.
input_video required string Input video file path
annotation_type string Annotation type: mask only, bounding box only, or both (ignored for binary and greenscreen)
"mask" click_frames string Frame indices for clicks as '0,0,150,0'. Auto-extends if shorter than coordinates.
"0" click_labels string Click types (1=foreground, 0=background) as '1,1,0,1'. Auto-extends if shorter than coordinates.
"1" click_object_ids string Object labels for clicks as 'person,dog,cat'. Auto-generates if missing or incomplete.
"" mask_type string Mask type: binary (B&W), highlighted (colored overlay), or greenscreen
"binary" output_format string Image format for sequence (ignored for video)
"webp" output_frame_interval integer Output every Nth frame. 1=all frames, 2=every other, etc.
1 output_quality integer JPG/WebP compression quality (0-100, ignored for PNG and video)
80 min: 0, max: 100 output_video boolean True for video output, False for image sequence
false video_fps integer Video output frame rate (ignored for image sequence)
30 min: 1, max: 60 33432afdfc06 Updated: 2/26/2026 57.1K runs
cinemasetfree.com