app library
the grid
250+ tools that run serverless on CPU or GPU.
call directly via API or let agents orchestrate them.
featured

p-video-avatar
Generate talking head videos from a portrait image with text or audio-driven speech

gemini-3-1-flash-image-preview
Gemini 3.1 Flash Image Preview (NanoBanana 2) via Vertex AI - Advanced image generation model powered by Google Cloud

veo-3-1-fast
Veo 3.1 Fast via Vertex AI - Generate videos from text prompts or images with optional audio

flux-1-kontext-dev
Edits existing images using text instructions, allowing for changes in style, characters, or objects, and reliably handles multiple edits while maintaining image coherence.
all apps

pruna/p-video-avatar
generate talking head videos from a portrait image with text or audio-driven speech

x/post-search
search recent posts on x.com. use conversation_id to get replies to a tweet, or any x search query. returns up to 100 posts with text, author, and engagement metrics.

infsh/deepseek-ocr-2
next-gen document ocr with improved math, tables, and reading order. converts images and pdfs to structured markdown.

heygen/create-avatar
create heygen avatars from video footage (digital twin), a photo (photo avatar), or a text prompt (ai-generated). returns a look id for use with avatar-video.

openrouter/qwen3-32b
qwen3 32b - powerful dense language model with reasoning and tool use capabilities via openrouter

openrouter/qwen3-8b
qwen3 8b - efficient dense language model with reasoning and tool use capabilities via openrouter

anthropic/claude-haiku-45
claude haiku 4.5 — fastest and most affordable claude. 200k context, 64k output, vision, extended thinking, tool use. direct api.

anthropic/claude-sonnet-45
claude sonnet 4.5 — previous generation sonnet. 200k context, 64k output, vision, extended thinking, tool use. direct api.

anthropic/claude-sonnet-46
claude sonnet 4.6 — best balance of speed and intelligence. 1m context, 64k output, vision, extended thinking, tool use. direct api.

anthropic/claude-opus-46
claude opus 4.6 — previous generation opus. 1m context, 128k output, vision, extended thinking, tool use. direct api.

anthropic/claude-opus-47
claude opus 4.7 — anthropic's most capable model. 1m context, 128k output, vision, extended thinking, tool use. direct api.

heygen/text-to-speech
generate natural speech audio from text using heygen's starfish tts engine. supports configurable voice, speed, ssml input, and multiple languages.

heygen/lipsync
re-sync video lip movements to new audio using heygen's lipsync technology. supports speed and precision modes with optional captioning.

heygen/video-translate
translate videos into 30+ languages with voice cloning and lip-sync using heygen. supports speed and precision modes with optional captioning.

heygen/video-agent
generate complete videos from natural language prompts using heygen's ai video agent. the agent handles avatar selection, scripting, and production automatically.

heygen/photo-video
animate portrait photos into talking videos using heygen. upload a face image and add speech with configurable voice, motion prompts, and expressiveness.

heygen/avatar-video
generate talking avatar videos using heygen's digital and photo avatars with avatar iv or v engines, configurable voice, resolution up to 4k, and expressiveness.

veed/subtitles
add professional burned-in subtitles to videos with 25+ style presets. supports 100+ languages with automatic transcription or custom srt files.

infsh/html-to-video
render html/css/js animations to video — supports gsap timelines, css animations, web animations api

klingai/image-v2
kling image v2 (kolors v2.0) - text-to-image with 2k resolution, multi-image reference, and restyle. restyle output matches input resolution.

klingai/image-o1
kling image o1 (kolors image-o1) - omni image generation with element control. text-to-image and image-to-image at 1k/2k. $0.028/image.

klingai/image-3o
kling image 3o (kolors image-3o) - most capable image model with native 4k, series-image generation, and element control. $0.028/image (4k $0.056).

klingai/image-v1
kling image v1 (kolors v1.0) - basic text-to-image and image-to-image generation. cheapest option at $0.0035/image.

klingai/image-v1-5
kling image v1.5 (kolors v1.5) - text-to-image with subject and face reference for character consistency. generate images preserving a person's appearance.

klingai/image-v2-1
kling image v2.1 (kolors v2.1) - text-to-image and multi-image reference generation. combine multiple images for complex compositions.

klingai/image-v3
kling image v3 (kolors v3.0) - latest image generation model with 1k/2k resolution support. highest quality text-to-image.

klingai/video-v3
kling v3.0 - latest and most capable video generation model. native 4k output, multi-shot generation, flexible 3-15s duration billed per second, element control, motion control, and synchronized audio.

klingai/video-v2-6
kling v2.6 video generation with native sound and voice control. supports text-to-video and image-to-video with start/end frames, synchronized audio generation, and voice-driven character animation.

klingai/video-o1
kling video o1 (omni) - unified video generation with text, image references, start/end frames, element references, and video references for editing and style transfer. the most capable kling model.

klingai/video-v2-5
kling v2.5 turbo - fast video generation from text and images. supports start/end frame interpolation in pro mode. optimized for speed while maintaining high quality at up to 1080p.

klingai/lip-sync
kling lip sync - drive mouth movements in videos using text or audio. ideal for dubbing, adding speech to silent videos, or replacing dialogue.

klingai/avatar
kling avatar - generate digital human broadcast-style talking head videos from a single face photo. provide text or audio for the avatar to speak.

klingai/video-to-audio
kling video-to-audio - add generated sound effects, ambient audio, or music to any video. works with kling-generated and user-uploaded videos (3-20s).

klingai/virtual-tryon
kling virtual try-on - ai clothing try-on from a person photo and clothing image. supports single items and upper+lower combos (v1.5). $0.07 per generation.

inworld/voice-cloning
clone a voice from 5-15 seconds of audio using inworld instant voice cloning. use the cloned voice id with any inworld tts model.

inworld/voice-design
design a custom voice from a text description using inworld ai. describe the voice you want and get up to 3 previews. publish the one you like to use with any inworld tts model.

inworld/text-to-speech-2
inworld tts-2 - high-quality multilingual text-to-speech with 100+ languages and natural-language steering

inworld/text-to-speech-1-5-max
inworld tts 1.5 max - low-latency text-to-speech with 15 languages (<200ms p50)

inworld/speech-to-text
inworld speech to text - multi-provider speech transcription with word timestamps

inworld/text-to-speech-1-5-mini
inworld tts 1.5 mini - ultra-low-latency text-to-speech with 15 languages (~120ms p50)

openrouter/claude-sonnet-46
sonnet 4.

openrouter/hy3-preview
hy3 preview is a high-efficiency mixture-of-experts model from tencent designed for agentic workflows and production use.

openrouter/gemini-3-flash-preview
gemini 3 flash preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance.

openrouter/kimi-k26
kimi k2.

openrouter/claude-opus-47
opus 4.

xai/grok-imagine-image-quality
generate and edit high-quality images using xai's grok imagine quality model. supports 1k and 2k output resolutions with text-to-image and image editing.

infsh/hyperframes-render
render heygen hyperframes compositions to video — supports clips, gsap timelines, track layering

bria/rmbg
remove the background from an image, producing a transparent cutout. the general-purpose background removal — for product-specific cutouts, use product-cutout instead. output can be passed to replace-background, blur-background, or any editing app.

bria/increase-resolution
upscale images 2x or 4x (max 8192x8192) while preserving original content

bria/expand
expand image canvas with ai-generated content matching the original scene

bria/erase
remove objects from images using mask-based inpainting while preserving quality

bria/generate
generate images from text prompts using bria fibo

bria/generate-lite
fast image generation from text prompts using bria fibo lite

bria/structured-prompt
generate structured prompt json from text or images using bria

bria/ads-generate
generate multiple ads in various sizes from templates and brand assets

bria/product-cutout
cut out product from image with transparent background

bria/gen-fill
generative fill — replace masked regions with ai-generated content guided by a text prompt

bria/product-packshot
generate professional 2000x2000 product packshot images

bria/replace-background
replace image background with ai-generated content from a text prompt or reference image

bria/video-rmbg
remove background from videos with optional color replacement

bria/product-shadow
add realistic shadows to product cutout images

bria/video-eraser
erase objects from video using a mask with inpainting

bria/video-replace-background
replace video background with an image or another video

bria/edit
edit an image using natural language text instructions

bria/video-increase-resolution
upscale video resolution up to 8k using ai super-resolution

bria/video-green-screen
apply green or blue screen effect to video foreground

bytedance/seedance-2-0
professional multimodal video generation from text, images, video, and audio references using bytedance's seedance 2.0 model via byteplus ark api. supports up to 1080p, text-to-video, image-to-video, and multimodal reference-to-video with synchronized audio.

bytedance/seedance-2-0-fast
fast multimodal video generation from text, images, video, and audio references using bytedance's seedance 2.0 fast model via byteplus ark api. supports text-to-video, image-to-video, and multimodal reference-to-video with synchronized audio.

alibaba/happyhorse-1-0-video-edit
happyhorse 1.0 video edit supports advanced video editing through natural language instructions with up to 5 reference images, preserving original motion dynamics via dashscope api

alibaba/happyhorse-1-0-t2v
happyhorse 1.0 text-to-video generates physically realistic videos with smooth motion from text prompts via dashscope api, supporting 720p/1080p resolution and up to 15 seconds duration

alibaba/happyhorse-1-0-i2v
happyhorse 1.0 image-to-video generates physically realistic videos with smooth motion from a single image and optional text description via dashscope api, supporting 720p/1080p resolution

alibaba/happyhorse-1-0-r2v
happyhorse 1.0 reference-to-video generates videos preserving subject characters from up to 9 reference images, with enhanced stability in subject and scene referencing via dashscope api

openai/gpt-image-2
generate and edit images using openai's gpt image 2 model. supports text-to-image, image editing with reference images, and mask-based inpainting.

falai/patina-extract-material
extracts a seamlessly tiling texture plus pbr material maps from a region of a source image described by a text prompt, via fal.ai patina.

falai/patina-text-to-material
generates seamlessly tiling pbr materials up to 8k from a text prompt (optional image-to-image and inpainting) via fal.ai patina.

falai/patina-image-to-material
predicts seamless high-resolution pbr material maps (basecolor, normal, roughness, metalness, height) from a single input image via fal.ai patina.

infsh/omnivoice
zero-shot text-to-speech with voice cloning for 600+ languages.

alibaba/wan-2-7-videoedit
wan 2.7 video edit performs instruction-based video editing and style transfer using multimodal inputs (text, images, video) via dashscope api with 720p/1080p output

alibaba/wan-2-7-r2v
wan 2.7 reference-to-video generates videos featuring characters from reference images and videos, supporting multi-character interaction, voice timbre cloning, and first-frame control

alibaba/wan-2-7-i2v
wan 2.7 image-to-video generates videos from images using multi-modal input (text, images, audio, video). supports first frame generation, first+last frame, and video continuation with 720p/1080p resolution

alibaba/wan-2-7-t2v
wan 2.7 text-to-video generates high-quality videos from text prompts using alibaba's latest video generation model via dashscope api, supporting 720p/1080p resolution and up to 15 seconds duration

infsh/image-resize
resize images by width, height, scale factor, or megapixel target

pruna/p-image-upscale
ai-powered image upscaling up to 128 megapixels with detail and realism enhancement

alibaba/wan-2-7-image-pro
wan 2.7 image pro is alibaba's professional image generation model supporting text-to-image, image editing, and multi-reference generation with up to 4k high-definition output

alibaba/wan-2-7-image
wan 2.7 image is alibaba's fast image generation model supporting text-to-image, image editing, and multi-reference image generation with up to 2k resolution

google/veo-3-1-lite
veo 3.1 lite via gemini api - lightweight video generation with text and image input, audio support

phota/train
train a phota identity profile from 30-50 face images, poll status, list and delete profiles

x/post-thread
create threaded posts on x.com. provide 2-25 tweets that are posted sequentially as a reply chain. each tweet supports text (280 char limit) and optional media (up to 4 images or 1 video/gif). images over 5mb are auto-resized.

phota/edit
edit images with text prompts while preserving identity of known subjects

phota/generate
generate images from text prompts with identity-preserved subjects via [[profile_id]] syntax

phota/enhance
automatically enhance photo quality — lighting, composition, color, and sharpness

xai/grok-extend-video
extend existing videos using xai's grok imagine video model. takes an existing video and generates additional frames to continue it with prompt guidance.

xai/grok-reference-video
generate videos using reference images for style and content guidance with xai's grok imagine video model. provide reference images to influence the visual style of generated videos.

openrouter/kimi-k25
kimi k2.5 is the latest model from moonshot ai, building on k2 with enhanced reasoning and capabilities.

openrouter/minimax-m-27
minimax-m2.7 is the latest large language model from minimax, building on m2.5 with enhanced reasoning, image understanding, and file processing capabilities.

xai/grok-tts
convert text into natural speech using xai's text to speech api. supports multiple voices, expressive speech tags, and mp3/wav/pcm output formats.

elevenlabs/forced-alignment
elevenlabs forced alignment - align text to audio with word timestamps

elevenlabs/text-to-dialogue
elevenlabs text to dialogue - generate immersive multi-voice dialogue

elevenlabs/dubbing
elevenlabs dubbing - automatically dub audio/video to other languages

elevenlabs/music
elevenlabs music - generate studio-quality music from text prompts
not enough? create new apps fast. templates + coding agents make it insanely extensible.
create your own apps
start from templates. add code, packages, docs. deploy in minutes.
schemas become tool parameters automatically. your app shows up in the grid and can be used by agents and workflows.
create workflows
build a graph of apps. deploy as a single callable app.

drag and drop to build the graph. map io to connect steps. deploy as an app.
we use cookies
we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.
by clicking "accept", you agree to our use of cookies.
learn more.