← Back to all generators
chenxwh/cogvlm2-video
Official
View on Replicate →
CogVLM2: Visual Language Models for Image and Video Understanding
Capabilities
Max Tokens
Top-P
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
input_video * | string (uri) | Input video | — | — |
max_new_tokens | integer | Maximum number of tokens to generate. A word is generally 2-3 tokens | 2048 | min: 0 |
prompt | string | Input prompt | "Describe this video." | — |
temperature | number | Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic | 0.1 | min: 0 |
top_p | number | When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens | 0.1 | min: 0, max: 1 |
input_video required string Input video
max_new_tokens integer Maximum number of tokens to generate. A word is generally 2-3 tokens
Default:
2048 min: 0 prompt string Input prompt
Default:
"Describe this video." temperature number Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic
Default:
0.1 min: 0 top_p number When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
Default:
0.1 min: 0, max: 1 Version:
9da7e9a554d3 Updated: 2/26/2026 672.6K runs
cinemasetfree.com