Video models
Compare every video generation model available in Kyoso.
Pick a model in the agent input with @. Different models support different durations, aspect ratios, and frame attachment options. Video generations typically take 30–120 seconds.
Models
| Model | Best for | Durations | Aspect ratios | Frames you can attach | Audio |
|---|---|---|---|---|---|
| Kling O3 | Real-time, action-focused 4K video | 3–15s | 16:9, 9:16, 1:1 | Start, end, video ref | Keeps source audio |
| Veo 3.1 Fast | Fast, cinematic video generation | 4 / 6 / 8s | 16:9, 9:16 | Start, end | Generates audio |
| Sora 2 | Realistic, detailed, long-form video | 4 / 8 / 12s | 16:9, 9:16 | Start | — |
| Kling V3 Standard | Physics-accurate 4K with camera control | 3–15s | 16:9, 9:16, 1:1 | Start | Generates audio |
| Kling O1 | Quick, vivid, action-focused synthesis | 5 / 10s | 16:9, 9:16, 1:1 | Start, end, video ref | — |
| Grok Imagine | Real-time cinematic, movie-style physics | 3–15s | 7 ratios (widest range) | Start, video ref | — |
| LTX 2 Fast | Open-source 4K with synced audio for fast iteration | 4 / 8s | 16:9, 9:16 | Start | — |
How to choose
- Need start and end frames? Use Kling O3, Veo 3.1 Fast, or Kling O1.
- Need synced audio? Use Veo 3.1 Fast, Kling V3 Standard, or LTX 2 Fast.
- Need a video as a style reference? Use Kling O3, Kling O1, or Grok Imagine.
- Need long clips? Sora 2 goes up to 12s; Kling O3, Kling V3 Standard, and Grok Imagine go up to 15s.
- Need vertical, square, and landscape support? Grok Imagine has the widest aspect ratio range.