← Back to all generators

chenxwh/cogvlm2-video

CogVLM2: Visual Language Models for Image and Video Understanding

Capabilities

Max Tokens Top-P

Cost

Community model (estimated from hardware time)

Input Parameters

input_video required string

Input video

max_new_tokens integer

Maximum number of tokens to generate. A word is generally 2-3 tokens

Default: 2048 min: 0
prompt string

Input prompt

Default: "Describe this video."
temperature number

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic

Default: 0.1 min: 0
top_p number

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens

Default: 0.1 min: 0, max: 1
Version: 9da7e9a554d3 Updated: 2/26/2026 672.6K runs