lucataco/bulk-video-caption

Video Preprocessing tool for captioning multiple videos using GPT, Claude or Gemini

Capabilities

System Prompt

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`video_zip_archive`*	string(uri)	ZIP archive containing videos to process	`—`	—
`anthropic_api_key`	string(password)	API key for Anthropic	`—`	—
`caption_prefix`	string	Optional prefix for video captions	`""`	—
`caption_suffix`	string	Optional suffix for video captions	`""`	—
`frames_to_extract`	integer	Number of frames to extract from each video for analysis	`2`	—
`google_generativeai_api_key`	string(password)	API key for Google Generative AI	`—`	—
`include_csv`	boolean	Whether to include CSV in output	`true`	—
`model`	string	AI model to use for captioning	`"gpt-4o"`	gpt-4ogpt-4o-minigpt-4-turboclaude-3-5-sonnet-20240620claude-3-opus-20240229claude-3-sonnet-20240229claude-3-haiku-20240307gemini-1.5-progemini-1.5-flash
`openai_api_key`	string(password)	API key for OpenAI	`—`	—
`system_prompt`	string	System prompt for caption generation	`" Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation. "`	—

video_zip_archiverequiredstring

ZIP archive containing videos to process

anthropic_api_keystring

API key for Anthropic

caption_prefixstring

Optional prefix for video captions

Default: ""

caption_suffixstring

Optional suffix for video captions

Default: ""

frames_to_extractinteger

Number of frames to extract from each video for analysis

Default: 2

google_generativeai_api_keystring

API key for Google Generative AI

include_csvboolean

Whether to include CSV in output

Default: true

modelstring

AI model to use for captioning

Default: "gpt-4o"

gpt-4ogpt-4o-minigpt-4-turboclaude-3-5-sonnet-20240620claude-3-opus-20240229claude-3-sonnet-20240229claude-3-haiku-20240307gemini-1.5-progemini-1.5-flash

openai_api_keystring

API key for OpenAI

system_promptstring

System prompt for caption generation

Default:

"
            Analyze these frames from a video and write a detailed caption. 
            Describe the type of video (e.g., animation, live-action footage, etc.).
            Focus on consistent elements across frames and any notable motion or action.
            Describe the main subjects, setting, and overall mood of the video.
            Use clear, descriptive language suitable for text-to-video generation.
            "

Version: bd610b3c0ecdUpdated: 7/25/2026179 runs