Inference Logoinference.sh

app library

the grid

250+ tools that run serverless on CPU or GPU.call directly via API or let agents orchestrate them.

250+apps
1API for everything

all apps

gpt-image-2

openai/gpt-image-2

generate and edit images using openai's gpt image 2 model. supports text-to-image, image editing with reference images, and mask-based inpainting.

patina-extract-material

falai/patina-extract-material

extracts a seamlessly tiling texture plus pbr material maps from a region of a source image described by a text prompt, via fal.ai patina.

patina-text-to-material

falai/patina-text-to-material

generates seamlessly tiling pbr materials up to 8k from a text prompt (optional image-to-image and inpainting) via fal.ai patina.

patina-image-to-material

falai/patina-image-to-material

predicts seamless high-resolution pbr material maps (basecolor, normal, roughness, metalness, height) from a single input image via fal.ai patina.

seedance-2-t2v

falai/seedance-2-t2v

generate videos with synchronized audio from text prompts using bytedance's seedance 2.0. supports quality and fast modes.

seedance-2-r2v

falai/seedance-2-r2v

generate videos from reference images, videos, and audio using bytedance's seedance 2.0. reference inputs as @image1, @video1, @audio1 in the prompt. supports quality and fast modes.

seedance-2-i2v

falai/seedance-2-i2v

generate videos with synchronized audio from images using bytedance's seedance 2.0. supports start/end frame control and quality/fast modes.

wan-2-7-videoedit

alibaba/wan-2-7-videoedit

wan 2.7 video edit performs instruction-based video editing and style transfer using multimodal inputs (text, images, video) via dashscope api with 720p/1080p output

wan-2-7-r2v

alibaba/wan-2-7-r2v

wan 2.7 reference-to-video generates videos featuring characters from reference images and videos, supporting multi-character interaction, voice timbre cloning, and first-frame control

wan-2-7-i2v

alibaba/wan-2-7-i2v

wan 2.7 image-to-video generates videos from images using multi-modal input (text, images, audio, video). supports first frame generation, first+last frame, and video continuation with 720p/1080p resolution

wan-2-7-t2v

alibaba/wan-2-7-t2v

wan 2.7 text-to-video generates high-quality videos from text prompts using alibaba's latest video generation model via dashscope api, supporting 720p/1080p resolution and up to 15 seconds duration

image-resize

infsh/image-resize

resize images by width, height, scale factor, or megapixel target

p-image-upscale

pruna/p-image-upscale

ai-powered image upscaling with detail and realism enhancement

wan-2-7-image-pro

alibaba/wan-2-7-image-pro

wan 2.7 image pro is alibaba's professional image generation model supporting text-to-image, image editing, and multi-reference generation with up to 4k high-definition output

wan-2-7-image

alibaba/wan-2-7-image

wan 2.7 image is alibaba's fast image generation model supporting text-to-image, image editing, and multi-reference image generation with up to 2k resolution

veo-3-1-lite

google/veo-3-1-lite

veo 3.1 lite via gemini api - lightweight video generation with text and image input, audio support

train

phota/train

train a phota identity profile from 30-50 face images, poll status, list and delete profiles

post-thread

x/post-thread

create threaded posts on x.com. provide 2-25 tweets that are posted sequentially as a reply chain. each tweet supports text (280 char limit) and optional media (up to 4 images or 1 video/gif). images over 5mb are auto-resized.

edit

phota/edit

edit images with text prompts while preserving identity of known subjects

generate

phota/generate

generate images from text prompts with identity-preserved subjects via [[profile_id]] syntax

enhance

phota/enhance

automatically enhance photo quality — lighting, composition, color, and sharpness

grok-extend-video

xai/grok-extend-video

extend existing videos using xai's grok imagine video model. takes an existing video and generates additional frames to continue it with prompt guidance.

grok-reference-video

xai/grok-reference-video

generate videos using reference images for style and content guidance with xai's grok imagine video model. provide reference images to influence the visual style of generated videos.

grok-tts

xai/grok-tts

convert text into natural speech using xai's text to speech api. supports multiple voices, expressive speech tags, and mp3/wav/pcm output formats.

forced-alignment

elevenlabs/forced-alignment

elevenlabs forced alignment - align text to audio with word timestamps

text-to-dialogue

elevenlabs/text-to-dialogue

elevenlabs text to dialogue - generate immersive multi-voice dialogue

dubbing

elevenlabs/dubbing

elevenlabs dubbing - automatically dub audio/video to other languages

music

elevenlabs/music

elevenlabs music - generate studio-quality music from text prompts

sound-effects

elevenlabs/sound-effects

elevenlabs sound effects - generate custom sound effects from text

voice-isolator

elevenlabs/voice-isolator

elevenlabs voice isolator - remove background noise from audio

voice-changer

elevenlabs/voice-changer

elevenlabs voice changer - transform voice in audio to a different voice

stt

elevenlabs/stt

elevenlabs speech to text (scribe) - high-accuracy transcription with diarization

tts

elevenlabs/tts

elevenlabs text to speech - high-quality multilingual voice synthesis

wan-i2v

pruna/wan-i2v

transform static images into animated videos with text prompts

wan-t2v

pruna/wan-t2v

generate videos directly from text descriptions in 480p or 720p

qwen-image-edit-plus

pruna/qwen-image-edit-plus

edit images using text instructions with multi-image support and pose transfer

flux-klein-4b

pruna/flux-klein-4b

lightweight 4b parameter model with excellent speed-to-quality ratio

z-image-turbo-lora

pruna/z-image-turbo-lora

fast generation with lora support for unique styles and personalized outputs

z-image-turbo

pruna/z-image-turbo

ultra-fast turbo image generation with minimal latency

qwen-image-fast

pruna/qwen-image-fast

fast qwen-based image generation with creativity control

qwen-image

pruna/qwen-image

advanced text-to-image generation with optional lora weights and prompt enhancement

wan-image-small

pruna/wan-image-small

fast, efficient text-to-image optimized for rapid prototyping and batch generation

flux-dev-lora

pruna/flux-dev-lora

text-to-image and image-to-image generation with custom lora weights from huggingface

flux-dev

pruna/flux-dev

advanced text-to-image generation with multiple aspect ratios, speed optimizations, and high-quality outputs

p-video

pruna/p-video

fast text-to-video and image-to-video in 720p/1080p with audio support

p-image-edit-lora

pruna/p-image-edit-lora

fast image editing with custom lora styles for unique transformations

p-image-lora

pruna/p-image-lora

pruna's flagship fast text-to-image with custom lora style support

p-image

pruna/p-image

pruna's flagship fast text-to-image with multiple aspect ratios and prompt enhancement

p-image-edit

pruna/p-image-edit

fast image editing with text instructions and multi-image support

shell

infsh/shell

execute shell commands in a sandboxed environment. run grep, sed, ls, find, and other cli tools with configurable working directory and timeout.

seedream-5-lite

bytedance/seedream-5-lite

generate high-quality 2k-3k images from text prompts with single or multi-image input. supports text-to-image, image-to-image, and multi-reference image blending using bytedance's seedream 5 lite model via byteplus ark api.

qwen-image-2-pro

alibaba/qwen-image-2-pro

qwen-image-2.0 pro offers enhanced text rendering, fine-grained realism, photorealistic scenes, and stronger semantic adherence for professional image generation and editing

qwen-image-2

alibaba/qwen-image-2

qwen-image-2.0 is alibaba's multimodal image generation model that integrates image generation and editing with enhanced text-rendering, realistic textures, and photorealistic scenes

gemini-3-1-flash-image-preview

google/gemini-3-1-flash-image-preview

gemini 3.1 flash image preview (nanobanana 2) via vertex ai - advanced image generation model powered by google cloud

gpt-oss-safeguard-20b

openrouter/gpt-oss-safeguard-20b

gpt-oss safeguard 20b

remotion-render

infsh/remotion-render

render videos from react/remotion component code — pass tsx, get mp4

minimax-m-25

openrouter/minimax-m-25

minimax-m2.5 is a sota large language model designed for real-world productivity. trained in a diverse range of complex real-world digital working environments, m2.5 builds upon the coding expertise of m2.1 to extend into general office work, reaching fluency in generating and operating word, excel, and powerpoint files, context switching between diverse software environments, and working across different agent and human teams.

kokoro-tts

falai/kokoro-tts

kokoro tts - lightweight text-to-speech with multiple languages and voices

grok-imagine-image-pro

xai/grok-imagine-image-pro

generate and edit images using xai's grok imagine pro model. supports text-to-image and image editing with multiple aspect ratios.

claude-opus-46

openrouter/claude-opus-46

opus 4.6 is anthropic’s strongest model for coding and long-running professional tasks. it is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time.

dia-tts

falai/dia-tts

dia tts - generate realistic dialogue with emotion control, natural nonverbals, and voice cloning

agent-browser

infsh/agent-browser

browser automation for ai agents. navigate, interact with @e refs, take screenshots, record video with cursor indicator, execute javascript. supports proxy configuration.

dm-send

x/dm-send

send a direct message on x.com. requires the recipient's user id (not username). text-only messages; media attachments are not supported.

user-follow

x/user-follow

follow a user on x.com by user id. succeeds silently if already following.

user-get

x/user-get

get a user profile from x.com by id or username. returns bio, follower/following counts, tweet count, verified status, and profile image url.

post-retweet

x/post-retweet

retweet a post on x.com by post id. succeeds silently if already retweeted.

post-like

x/post-like

like a post on x.com by post id. succeeds silently if already liked.

post-delete

x/post-delete

delete a post from x.com by post id. can only delete posts authored by the authenticated account.

post-get

x/post-get

get a post by id from x.com. returns text, author id, creation date, and engagement metrics (likes, retweets, replies, quotes).

post-create

x/post-create

create posts on x.com with text (280 char limit) and optional media. supports up to 4 images or 1 video/gif. can reply to or quote other posts. images over 5mb are auto-resized.

grok-imagine-video

xai/grok-imagine-video

generate and edit videos using xai's grok imagine video model. supports text-to-video, image-to-video, and video editing with configurable duration and resolution.

grok-imagine-image

xai/grok-imagine-image

generate and edit images using xai's grok imagine model. supports text-to-image and image editing with multiple aspect ratios.

veo-3-1

google/veo-3-1

veo 3.1 via vertex ai - advanced video generation with frame interpolation, reference images, and audio generation

veo-3-fast

google/veo-3-fast

veo 3 fast via vertex ai - fast video generation with audio from text prompts and images

veo-3

google/veo-3

veo 3 via vertex ai - generate videos with audio from text prompts and images

veo-2

google/veo-2

veo 2 via vertex ai - generate high-quality realistic videos from text prompts

veo-3-1-fast

google/veo-3-1-fast

veo 3.1 fast via vertex ai - generate videos from text prompts or images with optional audio

flux-dev-lora

falai/flux-dev-lora

text-to-image and image-to-image generation with flux.1 [dev] lora support. custom style adaptation and fine-tuned model variations from black forest labs.

flux-2-klein-lora

falai/flux-2-klein-lora

text-to-image and image-to-image generation with flux.2 [klein] lora support. available in 4b and 9b parameter sizes. custom style adaptation and fine-tuned model variations from black forest labs.

omnihuman-1-5

bytedance/omnihuman-1-5

multi-character audio-driven avatar video generation. takes a portrait image + audio and generates a video where the person speaks/sings in sync. supports specifying which character to drive.

omnihuman-1-0

bytedance/omnihuman-1-0

audio-driven avatar video generation. takes a portrait image + audio and generates a video where the person speaks/sings in sync with the audio.

seedream-3-0-t2i

bytedance/seedream-3-0-t2i

generate cinematic quality images from text prompts with accurate text rendering using bytedance's seedream 3.0 t2i model via byteplus ark api.

seedream-4-0

bytedance/seedream-4-0

generate high-quality 2k-4k images from text prompts with optional image-to-image generation using bytedance's seedream 4.0 model via byteplus ark api.

seedream-4-5

bytedance/seedream-4-5

generate high-quality 2k-4k images from text prompts with optional image-to-image generation using bytedance's seedream 4.5 model via byteplus ark api.

seedance-1-0-pro

bytedance/seedance-1-0-pro

generate high-quality videos up to 1080p from text prompts with optional first-frame image control using bytedance's seedance 1.0 pro model.

seedance-1-0-pro-fast

bytedance/seedance-1-0-pro-fast

fast high-quality video generation up to 1080p from text prompts with optional first-frame image control using bytedance's seedance 1.0 pro fast model.

seedance-1-5-pro

bytedance/seedance-1-5-pro

generate high-quality videos from text prompts with optional first-frame image control using bytedance's seedance 1.5 pro model via byteplus ark api.

imagine-art-1-5-pro-preview

falai/imagine-art-1-5-pro-preview

advanced text-to-image model creating ultra-high-fidelity 4k visuals with lifelike realism and refined aesthetics.

gemini-2-5-flash-image

google/gemini-2-5-flash-image

gemini 2.5 flash image (nanobanana) via vertex ai - advanced image generation model powered by google cloud

gemini-3-pro-image-preview

google/gemini-3-pro-image-preview

gemini 3 pro image preview (nanobanana pro) via vertex ai - advanced image generation model powered by google cloud

search-assistant

tavily/search-assistant

a search assistant that browses the internet to deliver comprehensive results, including ai-generated answers, images, and detailed sources.

kimi-k2-thinking

openrouter/kimi-k2-thinking

a powerful open-source thinking agent that excels at complex, multi-step problem-solving and consistently uses tools effectively over extended operations.

caption-videos

infsh/caption-videos

add captions to videos using an existing caption file, such as those generated by a speech-to-text service.

extract

tavily/extract

extracts clean, readable content, including text and images, from specified webpages, supporting batch processing for multiple urls.

array-switch

infsh/array-switch

allows you to choose between two different inputs based on a condition applied to an array of data.

extract-last-frame

infsh/extract-last-frame

save a specific frame from the end of a video as a static image file.

dia-tts

infsh/dia-tts

dia tts - generate realistic dialogue with emotion control, natural nonverbals, and voice cloning

media-merger

infsh/media-merger

merges multiple videos and images together using customized transitions.

array-element-switch

infsh/array-element-switch

selects one of two possible inputs based on a comparison check within an array.

mask-image

infsh/mask-image

combines two images—a main image and a semi-transparent mask—to selectively hide or reveal parts of the main image, creating a partially transparent result.

not enough? create new apps fast. templates + coding agents make it insanely extensible.

create your own apps

start from templates. add code, packages, docs. deploy in minutes.

$ infsh app init
my-app/
inference.py
requirements.txt
$ infsh app deploy

schemas become tool parameters automatically. your app shows up in the grid and can be used by agents and workflows.

create workflows

build a graph of apps. deploy as a single callable app.

workflow builder

drag and drop to build the graph. map io to connect steps. deploy as an app.

view all appsexplore what's available
create your own appread the docs & start building

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.