Video models

Pick a model in the agent input with @. Different models support different generation modes, durations, aspect ratios, and reference options. Video generations typically take 1–8 minutes. Faster models prioritize speed and are great for quick iteration, while slower models spend more time producing higher-quality results.

Generation modes

Each model supports a subset of four generation modes. The mode is picked automatically from your selection on the board (see Tools):

T2V — Text to Video: no selection, prompt only.
I2V — Image to Video: one image used as the start frame.
F2V — Frames to Video: two images (start and end frames).
R2V — Reference to Video: one or more images and/or videos used as style references.

Models

Fast models (1–4 min)

Great for quick iteration, drafts, and exploring ideas.

Model	Best for	Modes	Durations	Aspect ratios	Reference media	Audio
LTX 2 Fast	Open-source 4K with synced audio for fast iteration	T2V, I2V	4 / 8s	16:9, 9:16	Start frame	—
Grok Imagine	Real-time cinematic, movie-style physics	T2V, I2V	3–15s	7 ratios (widest range)	Start frame	—
Happy Horse	Quick text-to-video and image-to-video	T2V, I2V	4 / 8s	16:9, 9:16, 1:1	Start frame	—
Sora 2	Realistic, detailed, long-form video	T2V, I2V	4 / 8 / 12s	16:9, 9:16	Start frame	—
Veo 3.1 Fast	Fast, cinematic video generation	T2V, I2V, F2V	4 / 6 / 8s	16:9, 9:16	Start, end	Generates audio
Seedance 2.0	Versatile generation across every mode, including video references	T2V, I2V, F2V, R2V	4–15s	21:9, 16:9, 4:3, 1:1, 3:4, 9:16	Start, end, image refs, video refs	Generates audio

Longer models (5–8 min)

Take more time but produce higher-quality, more detailed output.

Model	Best for	Modes	Durations	Aspect ratios	Reference media	Audio
Kling O3	Real-time, action-focused 4K video	F2V	3–15s	16:9, 9:16, 1:1	Start, end, image refs	Keeps source audio
Kling O1	Vivid, action-focused synthesis	F2V	5 / 10s	16:9, 9:16, 1:1	Start, end, image refs	—
Kling V3 Standard	Physics-accurate 4K with camera control	T2V, F2V	3–15s	16:9, 9:16, 1:1	Start, image refs	Generates audio
Kling V3 Pro	Pro-grade 4K with character and object reference support; default for T2V and I2V	T2V, I2V, R2V	3–15s	16:9, 9:16, 1:1	Start, end, image refs	Generates audio (toggle)
Veo 3.1	Cinematic video from reference images with synced audio	R2V	8s	16:9, 9:16	Image refs	Generates audio

How to choose

Need fast results? Start with LTX 2 Fast, Grok Imagine, Happy Horse, or Sora 2 — they deliver in under 2 minutes.
Need the highest quality? Kling V3 Pro, Kling V3 Standard, or Kling O3 take longer but produce more detailed output.
Need start and end frames? Use Veo 3.1 Fast, Seedance 2.0, Kling O3, Kling O1, or Kling V3 Pro.
Need synced audio? Use Veo 3.1, Veo 3.1 Fast, Kling V3 Pro, Kling V3 Standard, Seedance 2.0, or LTX 2 Fast.
Need to use a video as a style reference? Seedance 2.0 is currently the only model that accepts video references (up to 3).
Need to use images as style references? Seedance 2.0, Veo 3.1, Kling V3 Pro, Kling V3 Standard, Kling O3, or Kling O1. When you attach an image reference, Kyoso automatically switches Kling V3 Pro into R2V mode.
Need long clips? Sora 2 goes up to 12s; Seedance 2.0, Kling O3, Kling V3 Standard, Kling V3 Pro, and Grok Imagine go up to 15s.
Need vertical, square, and landscape support? Grok Imagine has the widest aspect ratio range; Seedance 2.0 covers every common ratio.