klingai
15 apps available on inference.sh.
run via API, SDK, or belt CLI.

image-v2
Kling Image V2 (Kolors V2.0) - text-to-image with 2K resolution, multi-image reference, and restyle. Restyle output matches input resolution.
image
image-o1
Kling Image O1 (Kolors Image-O1) - omni image generation with element control. Text-to-image and image-to-image at 1K/2K. $0.028/image.
imageimage
8
image-v2
kling image v2 (kolors v2.0) - text-to-image with 2k resolution, multi-image reference, and restyle. restyle output matches input resolution.

image-o1
kling image o1 (kolors image-o1) - omni image generation with element control. text-to-image and image-to-image at 1k/2k. $0.028/image.

image-3o
kling image 3o (kolors image-3o) - most capable image model with native 4k, series-image generation, and element control. $0.028/image (4k $0.056).

image-v1
kling image v1 (kolors v1.0) - basic text-to-image and image-to-image generation. cheapest option at $0.0035/image.

image-v1-5
kling image v1.5 (kolors v1.5) - text-to-image with subject and face reference for character consistency. generate images preserving a person's appearance.

image-v2-1
kling image v2.1 (kolors v2.1) - text-to-image and multi-image reference generation. combine multiple images for complex compositions.

image-v3
kling image v3 (kolors v3.0) - latest image generation model with 1k/2k resolution support. highest quality text-to-image.

virtual-tryon
kling virtual try-on - ai clothing try-on from a person photo and clothing image. supports single items and upper+lower combos (v1.5). $0.07 per generation.
video
6
video-v3
kling v3.0 - latest and most capable video generation model. native 4k output, multi-shot generation, flexible 3-15s duration billed per second, element control, motion control, and synchronized audio.

video-v2-6
kling v2.6 video generation with native sound and voice control. supports text-to-video and image-to-video with start/end frames, synchronized audio generation, and voice-driven character animation.

video-o1
kling video o1 (omni) - unified video generation with text, image references, start/end frames, element references, and video references for editing and style transfer. the most capable kling model.

video-v2-5
kling v2.5 turbo - fast video generation from text and images. supports start/end frame interpolation in pro mode. optimized for speed while maintaining high quality at up to 1080p.

lip-sync
kling lip sync - drive mouth movements in videos using text or audio. ideal for dubbing, adding speech to silent videos, or replacing dialogue.

avatar
kling avatar - generate digital human broadcast-style talking head videos from a single face photo. provide text or audio for the avatar to speak.
explore more on inference.sh
klingai is one of many providers on the grid. discover 250+ apps across image, video, audio, and more.
we use cookies
we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.
by clicking "accept", you agree to our use of cookies.
learn more.
