Video models

Compare every video generation model available in Kyoso.

Pick a model in the agent input with @. Different models support different generation modes, durations, aspect ratios, and reference options. Video generations typically take 1–8 minutes. Faster models prioritize speed and are great for quick iteration, while slower models spend more time producing higher-quality results.

Generation modes

Each model supports a subset of four generation modes. The mode is picked automatically from your selection on the board (see Tools):

  • T2V — Text to Video: no selection, prompt only.
  • I2V — Image to Video: one image used as the start frame.
  • F2V — Frames to Video: two images (start and end frames).
  • R2V — Reference to Video: one or more images and/or videos used as style references.

Models

Fast models (1–4 min)

Great for quick iteration, drafts, and exploring ideas.

ModelBest forModesDurationsAspect ratiosReference mediaAudio
LTX 2 FastOpen-source 4K with synced audio for fast iterationT2V, I2V4 / 8s16:9, 9:16Start frame
Grok ImagineReal-time cinematic, movie-style physicsT2V, I2V3–15s7 ratios (widest range)Start frame
Happy HorseQuick text-to-video and image-to-videoT2V, I2V4 / 8s16:9, 9:16, 1:1Start frame
Sora 2Realistic, detailed, long-form videoT2V, I2V4 / 8 / 12s16:9, 9:16Start frame
Veo 3.1 FastFast, cinematic video generationT2V, I2V, F2V4 / 6 / 8s16:9, 9:16Start, endGenerates audio
Seedance 2.0Versatile generation across every mode, including video referencesT2V, I2V, F2V, R2V4–15s21:9, 16:9, 4:3, 1:1, 3:4, 9:16Start, end, image refs, video refsGenerates audio

Longer models (5–8 min)

Take more time but produce higher-quality, more detailed output.

ModelBest forModesDurationsAspect ratiosReference mediaAudio
Kling O3Real-time, action-focused 4K videoF2V3–15s16:9, 9:16, 1:1Start, end, image refsKeeps source audio
Kling O1Vivid, action-focused synthesisF2V5 / 10s16:9, 9:16, 1:1Start, end, image refs
Kling V3 StandardPhysics-accurate 4K with camera controlT2V, F2V3–15s16:9, 9:16, 1:1Start, image refsGenerates audio
Kling V3 ProPro-grade 4K with character and object reference support; default for T2V and I2VT2V, I2V, R2V3–15s16:9, 9:16, 1:1Start, end, image refsGenerates audio (toggle)
Veo 3.1Cinematic video from reference images with synced audioR2V8s16:9, 9:16Image refsGenerates audio

How to choose

  • Need fast results? Start with LTX 2 Fast, Grok Imagine, Happy Horse, or Sora 2 — they deliver in under 2 minutes.
  • Need the highest quality? Kling V3 Pro, Kling V3 Standard, or Kling O3 take longer but produce more detailed output.
  • Need start and end frames? Use Veo 3.1 Fast, Seedance 2.0, Kling O3, Kling O1, or Kling V3 Pro.
  • Need synced audio? Use Veo 3.1, Veo 3.1 Fast, Kling V3 Pro, Kling V3 Standard, Seedance 2.0, or LTX 2 Fast.
  • Need to use a video as a style reference? Seedance 2.0 is currently the only model that accepts video references (up to 3).
  • Need to use images as style references? Seedance 2.0, Veo 3.1, Kling V3 Pro, Kling V3 Standard, Kling O3, or Kling O1. When you attach an image reference, Kyoso automatically switches Kling V3 Pro into R2V mode.
  • Need long clips? Sora 2 goes up to 12s; Seedance 2.0, Kling O3, Kling V3 Standard, Kling V3 Pro, and Grok Imagine go up to 15s.
  • Need vertical, square, and landscape support? Grok Imagine has the widest aspect ratio range; Seedance 2.0 covers every common ratio.

On this page