Inference Logoinference.sh

app library

the grid

250+ tools that run serverless on CPU or GPU.call directly via API or let agents orchestrate them.

250+apps
1API for everything

all apps

p-video-avatar

pruna/p-video-avatar

generate talking head videos from a portrait image with text or audio-driven speech

video-upscale

topaz/video-upscale

video upscaling and enhancement — proteus family models for precision upscaling, deinterlacing, face recovery, and cgi enhancement

search

arxiv/search

search arxiv papers by query with field prefixes, boolean operators, category filtering, and sorting

search

biorxiv/search

search biorxiv and medrxiv preprints by date range with optional category filtering

search

chemrxiv/search

search chemrxiv preprints via crossref api

paper

arxiv/paper

get a specific arxiv paper by its id with full metadata including title, authors, abstract, and links

paper

chemrxiv/paper

get a chemrxiv paper by doi via crossref api

paper

biorxiv/paper

get a specific biorxiv or medrxiv paper by its doi

astra

topaz/astra

creative video upscaling — ai-guided upscaling with prompt and creativity controls

starlight

topaz/starlight

generative video upscaling — precision, hq, mini, sharp, and fast models

denoise

topaz/denoise

video denoising — nyx family models for noise, compression, and artifact removal

video-utilities

topaz/video-utilities

video utilities — motion deblur, colorization, and sdr to hdr conversion

frame-interpolation

topaz/frame-interpolation

video frame interpolation — slowmo and fps boost with apollo, chronos, and aion models

proteus

topaz/proteus

proteus video upscaling and enhancement — precision upscaling, deinterlacing, face recovery, and cgi enhancement

contents

you/contents

contents api — fetch clean markdown or html from any url, batch up to 10 pages per request

finance-research

you/finance-research

finance research api — agentic financial research with filings, macro data, and institutional-grade sources

research

you/research

deep research api — multi-step web research with source-backed citations and configurable effort levels

search

you/search

web search api — ground your apps in reliable, web-scale knowledge with contextual snippets

claude-sonnet-5

anthropic/claude-sonnet-5

claude sonnet 5 — frontier sonnet with near-opus performance. 1m context, vision, extended thinking, tool use. direct api.

claude-opus-4-8

anthropic/claude-opus-4-8

claude opus 4.8 — anthropic's most capable opus model. 1m context, 128k output, vision, extended thinking, tool use. direct api.

gemini-3-1-flash-lite-image

google/gemini-3-1-flash-lite-image

gemini 3.1 flash lite image (nanobanana 2 lite) via vertex ai — ultra-low latency image generation

gemini-omni-flash

google/gemini-omni-flash

gemini omni flash — text-to-video with synchronized audio, grounded in real-world knowledge

mai-image-2-5

microsoft/mai-image-2-5

mai image 2.5 — microsoft's photorealistic image generation and editing model with fine-grained pixel-level control.

glm-5-2

openrouter/glm-5-2

glm 5.2 - zhipu's latest flagship language model with 1m context via openrouter

remix

reve/remix

reve remix — create images from text and 1-6 reference images combined.

edit

reve/edit

reve edit — edit images with natural language instructions. top 3 on lmarena leaderboard.

create

reve/create

reve create — generate images from text with best-in-class prompt adherence and text rendering.

p-video-replace

pruna/p-video-replace

replace characters in videos using reference images. preserves motion, timing, camera, and scene.

claude-mythos-5

anthropic/claude-mythos-5

claude mythos 5 — project glasswing. successor to claude mythos preview. 1m context, 128k output, adaptive thinking, vision, tool use. direct api.

claude-fable-5

anthropic/claude-fable-5

claude fable 5 — anthropic's most capable widely released model. 1m context, 128k output, adaptive thinking, vision, tool use. direct api.

voice-remix

elevenlabs/voice-remix

elevenlabs voice remix - modify voice characteristics like accent, gender, style, pacing

voice-clone

elevenlabs/voice-clone

elevenlabs voice clone - instantly clone a voice from audio samples

voice-design

elevenlabs/voice-design

elevenlabs voice design - create custom ai voices from text descriptions

post-search

x/post-search

search recent posts on x.com. use conversation_id to get replies to a tweet, or any x search query. returns up to 100 posts with text, author, and engagement metrics.

deepseek-ocr-2

infsh/deepseek-ocr-2

next-gen document ocr with improved math, tables, and reading order. converts images and pdfs to structured markdown.

create-avatar

heygen/create-avatar

create heygen avatars from video footage (digital twin), a photo (photo avatar), or a text prompt (ai-generated). returns a look id for use with avatar-video.

qwen3-32b

openrouter/qwen3-32b

qwen3 32b - powerful dense language model with reasoning and tool use capabilities via openrouter

qwen3-8b

openrouter/qwen3-8b

qwen3 8b - efficient dense language model with reasoning and tool use capabilities via openrouter

claude-haiku-45

anthropic/claude-haiku-45

claude haiku 4.5 — fastest and most affordable claude. 200k context, 64k output, vision, extended thinking, tool use. direct api.

claude-sonnet-45

anthropic/claude-sonnet-45

claude sonnet 4.5 — previous generation sonnet. 200k context, 64k output, vision, extended thinking, tool use. direct api.

claude-sonnet-46

anthropic/claude-sonnet-46

claude sonnet 4.6 — best balance of speed and intelligence. 1m context, 64k output, vision, extended thinking, tool use. direct api.

claude-opus-46

anthropic/claude-opus-46

claude opus 4.6 — previous generation opus. 1m context, 128k output, vision, extended thinking, tool use. direct api.

claude-opus-47

anthropic/claude-opus-47

claude opus 4.7 — anthropic's most capable model. 1m context, 128k output, vision, extended thinking, tool use. direct api.

text-to-speech

heygen/text-to-speech

generate natural speech audio from text using heygen's starfish tts engine. supports configurable voice, speed, ssml input, and multiple languages.

lipsync

heygen/lipsync

re-sync video lip movements to new audio using heygen's lipsync technology. supports speed and precision modes with optional captioning.

video-translate

heygen/video-translate

translate videos into 30+ languages with voice cloning and lip-sync using heygen. supports speed and precision modes with optional captioning.

video-agent

heygen/video-agent

generate complete videos from natural language prompts using heygen's ai video agent. the agent handles avatar selection, scripting, and production automatically.

photo-video

heygen/photo-video

animate portrait photos into talking videos using heygen. upload a face image and add speech with configurable voice, motion prompts, and expressiveness.

avatar-video

heygen/avatar-video

generate talking avatar videos using heygen's digital and photo avatars with avatar iv or v engines, configurable voice, resolution up to 4k, and expressiveness.

subtitles

veed/subtitles

add professional burned-in subtitles to videos with 25+ style presets. supports 100+ languages with automatic transcription or custom srt files.

html-to-video

infsh/html-to-video

render html/css/js animations to video — supports gsap timelines, css animations, web animations api

image-v2

klingai/image-v2

kling image v2 (kolors v2.0) - text-to-image with 2k resolution, multi-image reference, and restyle. restyle output matches input resolution.

image-o1

klingai/image-o1

kling image o1 (kolors image-o1) - omni image generation with element control. text-to-image and image-to-image at 1k/2k. $0.028/image.

image-3o

klingai/image-3o

kling image 3o (kolors image-3o) - most capable image model with native 4k, series-image generation, and element control. $0.028/image (4k $0.056).

image-v1

klingai/image-v1

kling image v1 (kolors v1.0) - basic text-to-image and image-to-image generation. cheapest option at $0.0035/image.

image-v1-5

klingai/image-v1-5

kling image v1.5 (kolors v1.5) - text-to-image with subject and face reference for character consistency. generate images preserving a person's appearance.

image-v2-1

klingai/image-v2-1

kling image v2.1 (kolors v2.1) - text-to-image and multi-image reference generation. combine multiple images for complex compositions.

image-v3

klingai/image-v3

kling image v3 (kolors v3.0) - latest image generation model with 1k/2k resolution support. highest quality text-to-image.

video-v3

klingai/video-v3

kling v3.0 - latest and most capable video generation model. native 4k output, multi-shot generation, flexible 3-15s duration billed per second, element control, motion control, and synchronized audio.

video-v2-6

klingai/video-v2-6

kling v2.6 video generation with native sound and voice control. supports text-to-video and image-to-video with start/end frames, synchronized audio generation, and voice-driven character animation.

video-o1

klingai/video-o1

kling video o1 (omni) - unified video generation with text, image references, start/end frames, element references, and video references for editing and style transfer. the most capable kling model.

video-v2-5

klingai/video-v2-5

kling v2.5 turbo - fast video generation from text and images. supports start/end frame interpolation in pro mode. optimized for speed while maintaining high quality at up to 1080p.

lip-sync

klingai/lip-sync

kling lip sync - drive mouth movements in videos using text or audio. ideal for dubbing, adding speech to silent videos, or replacing dialogue.

avatar

klingai/avatar

kling avatar - generate digital human broadcast-style talking head videos from a single face photo. provide text or audio for the avatar to speak.

video-to-audio

klingai/video-to-audio

kling video-to-audio - add generated sound effects, ambient audio, or music to any video. works with kling-generated and user-uploaded videos (3-20s).

virtual-tryon

klingai/virtual-tryon

kling virtual try-on - ai clothing try-on from a person photo and clothing image. supports single items and upper+lower combos (v1.5). $0.07 per generation.

voice-cloning

inworld/voice-cloning

clone a voice from 5-15 seconds of audio using inworld instant voice cloning. use the cloned voice id with any inworld tts model.

voice-design

inworld/voice-design

design a custom voice from a text description using inworld ai. describe the voice you want and get up to 3 previews. publish the one you like to use with any inworld tts model.

text-to-speech-2

inworld/text-to-speech-2

inworld tts-2 - high-quality multilingual text-to-speech with 100+ languages and natural-language steering

text-to-speech-1-5-max

inworld/text-to-speech-1-5-max

inworld tts 1.5 max - low-latency text-to-speech with 15 languages (<200ms p50)

speech-to-text

inworld/speech-to-text

inworld speech to text - multi-provider speech transcription with word timestamps

text-to-speech-1-5-mini

inworld/text-to-speech-1-5-mini

inworld tts 1.5 mini - ultra-low-latency text-to-speech with 15 languages (~120ms p50)

claude-sonnet-46

openrouter/claude-sonnet-46

sonnet 4.

hy3-preview

openrouter/hy3-preview

hy3 preview is a high-efficiency mixture-of-experts model from tencent designed for agentic workflows and production use.

gemini-3-flash-preview

openrouter/gemini-3-flash-preview

gemini 3 flash preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance.

kimi-k26

openrouter/kimi-k26

kimi k2.

claude-opus-47

openrouter/claude-opus-47

opus 4.

grok-imagine-image-quality

xai/grok-imagine-image-quality

generate and edit high-quality images using xai's grok imagine quality model. supports 1k and 2k output resolutions with text-to-image and image editing.

hyperframes-render

infsh/hyperframes-render

render heygen hyperframes compositions to video — supports clips, gsap timelines, track layering

rmbg

bria/rmbg

remove the background from an image, producing a transparent cutout. the general-purpose background removal — for product-specific cutouts, use product-cutout instead. output can be passed to replace-background, blur-background, or any editing app.

increase-resolution

bria/increase-resolution

upscale images 2x or 4x (max 8192x8192) while preserving original content

expand

bria/expand

expand image canvas with ai-generated content matching the original scene

erase

bria/erase

remove objects from images using mask-based inpainting while preserving quality

generate

bria/generate

generate images from text prompts using bria fibo

generate-lite

bria/generate-lite

fast image generation from text prompts using bria fibo lite

structured-prompt

bria/structured-prompt

generate structured prompt json from text or images using bria

ads-generate

bria/ads-generate

generate multiple ads in various sizes from templates and brand assets

product-cutout

bria/product-cutout

cut out product from image with transparent background

gen-fill

bria/gen-fill

generative fill — replace masked regions with ai-generated content guided by a text prompt

product-packshot

bria/product-packshot

generate professional 2000x2000 product packshot images

replace-background

bria/replace-background

replace image background with ai-generated content from a text prompt or reference image

video-rmbg

bria/video-rmbg

remove background from videos with optional color replacement

product-shadow

bria/product-shadow

add realistic shadows to product cutout images

video-eraser

bria/video-eraser

erase objects from video using a mask with inpainting

video-replace-background

bria/video-replace-background

replace video background with an image or another video

edit

bria/edit

edit an image using natural language text instructions

video-increase-resolution

bria/video-increase-resolution

upscale video resolution up to 8k using ai super-resolution

video-green-screen

bria/video-green-screen

apply green or blue screen effect to video foreground

seedance-2-0

bytedance/seedance-2-0

professional multimodal video generation from text, images, video, and audio references using bytedance's seedance 2.0 model via byteplus ark api. supports up to 4k (10-bit color), text-to-video, image-to-video, and multimodal reference-to-video with synchronized audio.

seedance-2-0-fast

bytedance/seedance-2-0-fast

fast multimodal video generation from text, images, video, and audio references using bytedance's seedance 2.0 fast model via byteplus ark api. supports text-to-video, image-to-video, and multimodal reference-to-video with synchronized audio.

not enough? create new apps fast. templates + coding agents make it insanely extensible.

create your own apps

start from templates. add code, packages, docs. deploy in minutes.

$ infsh app init
my-app/
inference.py
requirements.txt
$ infsh app deploy

schemas become tool parameters automatically. your app shows up in the grid and can be used by agents and workflows.

create workflows

build a graph of apps. deploy as a single callable app.

workflow builder

drag and drop to build the graph. map io to connect steps. deploy as an app.

view all appsexplore what's available
create your own appread the docs & start building

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.