Video models
Compare every video generation model available in Kyoso.
Pick a model in the agent input with @. Different models support different generation modes, durations, aspect ratios, and reference options. Video generations typically take 1–8 minutes. Faster models prioritize speed and are great for quick iteration, while slower models spend more time producing higher-quality results.
Generation modes
Each model supports a subset of four generation modes. The mode is picked automatically from your selection on the board (see Tools):
- T2V — Text to Video: no selection, prompt only.
- I2V — Image to Video: one image used as the start frame.
- F2V — Frames to Video: two images (start and end frames).
- R2V — Reference to Video: one or more images and/or videos used as style references.
Models
Fast models (1–4 min)
Great for quick iteration, drafts, and exploring ideas.
| Model | Best for | Modes | Durations | Aspect ratios | Reference media | Audio |
|---|---|---|---|---|---|---|
| LTX 2 Fast | Open-source 4K with synced audio for fast iteration | T2V, I2V | 4 / 8s | 16:9, 9:16 | Start frame | — |
| Grok Imagine | Real-time cinematic, movie-style physics | T2V, I2V | 3–15s | 7 ratios (widest range) | Start frame | — |
| Happy Horse | Quick text-to-video and image-to-video | T2V, I2V | 4 / 8s | 16:9, 9:16, 1:1 | Start frame | — |
| Sora 2 | Realistic, detailed, long-form video | T2V, I2V | 4 / 8 / 12s | 16:9, 9:16 | Start frame | — |
| Veo 3.1 Fast | Fast, cinematic video generation | T2V, I2V, F2V | 4 / 6 / 8s | 16:9, 9:16 | Start, end | Generates audio |
| Seedance 2.0 | Versatile generation across every mode, including video references | T2V, I2V, F2V, R2V | 4–15s | 21:9, 16:9, 4:3, 1:1, 3:4, 9:16 | Start, end, image refs, video refs | Generates audio |
Longer models (5–8 min)
Take more time but produce higher-quality, more detailed output.
| Model | Best for | Modes | Durations | Aspect ratios | Reference media | Audio |
|---|---|---|---|---|---|---|
| Kling O3 | Real-time, action-focused 4K video | F2V | 3–15s | 16:9, 9:16, 1:1 | Start, end, image refs | Keeps source audio |
| Kling O1 | Vivid, action-focused synthesis | F2V | 5 / 10s | 16:9, 9:16, 1:1 | Start, end, image refs | — |
| Kling V3 Standard | Physics-accurate 4K with camera control | T2V, F2V | 3–15s | 16:9, 9:16, 1:1 | Start, image refs | Generates audio |
| Kling V3 Pro | Pro-grade 4K with character and object reference support; default for T2V and I2V | T2V, I2V, R2V | 3–15s | 16:9, 9:16, 1:1 | Start, end, image refs | Generates audio (toggle) |
| Veo 3.1 | Cinematic video from reference images with synced audio | R2V | 8s | 16:9, 9:16 | Image refs | Generates audio |
How to choose
- Need fast results? Start with LTX 2 Fast, Grok Imagine, Happy Horse, or Sora 2 — they deliver in under 2 minutes.
- Need the highest quality? Kling V3 Pro, Kling V3 Standard, or Kling O3 take longer but produce more detailed output.
- Need start and end frames? Use Veo 3.1 Fast, Seedance 2.0, Kling O3, Kling O1, or Kling V3 Pro.
- Need synced audio? Use Veo 3.1, Veo 3.1 Fast, Kling V3 Pro, Kling V3 Standard, Seedance 2.0, or LTX 2 Fast.
- Need to use a video as a style reference? Seedance 2.0 is currently the only model that accepts video references (up to 3).
- Need to use images as style references? Seedance 2.0, Veo 3.1, Kling V3 Pro, Kling V3 Standard, Kling O3, or Kling O1. When you attach an image reference, Kyoso automatically switches Kling V3 Pro into R2V mode.
- Need long clips? Sora 2 goes up to 12s; Seedance 2.0, Kling O3, Kling V3 Standard, Kling V3 Pro, and Grok Imagine go up to 15s.
- Need vertical, square, and landscape support? Grok Imagine has the widest aspect ratio range; Seedance 2.0 covers every common ratio.