lucataco/videollama3-7b

VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding

Capabilities

Max TokensTop-P

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`prompt`*	string	Text prompt to guide the model's response	`—`	—
`video`*	string(uri)	Input video file	`—`	—
`fps`	number	Frames per second to sample from video	`1`	min: 0, max: 10
`max_frames`	integer	Maximum number of frames to process	`180`	min: 0, max: 256
`max_new_tokens`	integer	Maximum number of tokens to generate	`2048`	min: 0, max: 4096
`temperature`	number	Sampling temperature	`0.2`	min: 0, max: 1
`top_p`	number	Top-p sampling	`0.9`	min: 0, max: 1

promptrequiredstring

Text prompt to guide the model's response

videorequiredstring

Input video file

fpsnumber

Frames per second to sample from video

Default: 1min: 0, max: 10

max_framesinteger

Maximum number of frames to process

Default: 180min: 0, max: 256

max_new_tokensinteger

Maximum number of tokens to generate

Default: 2048min: 0, max: 4096

temperaturenumber

Sampling temperature

Default: 0.2min: 0, max: 1

top_pnumber

Top-p sampling

Default: 0.9min: 0, max: 1

Version: 34a1f45f7068Updated: 7/25/202632.7K runs