Inference Logoinference.sh

app library

the grid

150+ tools that run serverless on CPU or GPU.call directly via API or let agents orchestrate them.

150+apps
1API for everything

all apps

imagine-art-1-5-pro-preview

falai/imagine-art-1-5-pro-preview

advanced text-to-image model creating ultra-high-fidelity 4k visuals with lifelike realism and refined aesthetics.

ltx-video-2

infsh/ltx-video-2

ltx 2.0 audio-video foundation model. generates videos with synced audio. supports t2v, i2v, long video generation with multi-prompt sliding windows, and lora adapters.

youtube-downloader

infsh/youtube-downloader

download youtube videos and audio with customizable quality, format, and codec options. supports audio-only extraction, video+audio, or video-only downloads.

gemini-2-5-flash-image

google/gemini-2-5-flash-image

gemini 2.5 flash image (a.k.a. nano banana) via vertex ai - advanced image generation model powered by google cloud

gemini-3-pro-image-preview

google/gemini-3-pro-image-preview

gemini 3 pro image preview (a.k.a nano banana pro) via vertex ai - advanced image generation model powered by google cloud

gcp-gemini-3-pro-image-preview

infsh/gcp-gemini-3-pro-image-preview

gemini 3 pro image preview via vertex ai - advanced image generation model powered by google cloud

post-tweet

infsh/post-tweet

post tweets to x.com (twitter)

flux-2-dev-turbo

infsh/flux-2-dev-turbo

flux.2 dev turbo is an open-weight, guidance and step distilled text-to-image model developed by black forest labs for non-commercial applications

seedance-1-5-pro-t2v

falai/seedance-1-5-pro-t2v

generate videos with audio from text prompts using bytedance's seedance 1.5 pro. high-quality text-to-video generation with audio synthesis.

seedance-1-5-pro-i2v

falai/seedance-1-5-pro-i2v

generate videos with audio from images using bytedance's seedance 1.5 pro. supports start and end frame control for seamless video transitions.

videoseal

infsh/videoseal

embed or detect imperceptible watermarking for images and videos.

wan-2-2-animate-v2v

infsh/wan-2-2-animate-v2v

creates animated videos by transferring movement from one video to a static character image, or replaces a character in an existing video with a new image while preserving the original motion and environment.

T

infsh/tavily-search-assistant

a search assistant that browses the internet to deliver comprehensive results, including ai-generated answers, images, and detailed sources.

wan2gp-i2v

infsh/wan2gp-i2v

turns static images into dynamic, realistic videos, often including tools for advanced control over motion and video style.

kimi-k2-thinking

openrouter/kimi-k2-thinking

a powerful open-source thinking agent that excels at complex, multi-step problem-solving and consistently uses tools effectively over extended operations.

nano-banana-pro-edit

falai/nano-banana-pro-edit

advanced image editing and generation capabilities for professional-quality results.

caption-videos

infsh/caption-videos

add captions to videos using an existing caption file, such as those generated by a speech-to-text service.

hunyuanvideo-foley

infsh/hunyuanvideo-foley

synthesizes realistic sound effects and audio tracks based on your video content and written descriptions.

lightning-wan-2-2-i2v-a14b

infsh/lightning-wan-2-2-i2v-a14b

generates videos from a single image quickly, utilizing the wan 2.2 architecture and lightning lora technology for high-speed results.

vibevoice

infsh/vibevoice

generates high-quality podcasts using natural speech and voices for multiple speakers.

qwen3-30b-a3b

infsh/qwen3-30b-a3b

a powerful language application that excels at multilingual communication and complex task execution, designed for fast performance.

hidream-i1-full

infsh/hidream-i1-full

generates high-quality images with state-of-the-art results.

wan2-1-i2v

infsh/wan2-1-i2v

generates high-quality, dynamic videos from a single static image.

flux-1-kontext-dev

infsh/flux-1-kontext-dev

edits existing images using text instructions, allowing for changes in style, characters, or objects, and reliably handles multiple edits while maintaining image coherence.

sdxl

infsh/sdxl

generates and modifies images from text prompts, producing high-resolution, photorealistic results with superior detail and accuracy compared to previous versions.

flux-1-fill

infsh/flux-1-fill

fills designated areas in existing images based on a descriptive text input.

hidream-e1-full

infsh/hidream-e1-full

edit images by simply telling it what changes you want, such as altering colors, backgrounds, or accessories with precision.

real-esrgan

infsh/real-esrgan

enhances low-quality, degraded images and videos by upscaling resolution, reducing noise, and restoring fine details.

tencent-song-generation

infsh/tencent-song-generation

generates high-quality songs, including both music and vocals, and can also produce pure music or vocal-only tracks.

nano-banana-pro

falai/nano-banana-pro

advanced tool for generating and editing images.

rife-video-interpolation

infsh/rife-video-interpolation

increases video smoothness by using ai to generate and insert extra frames, effectively raising the video's frame rate.

qwen-image-edit-lightning-plus

infsh/qwen-image-edit-lightning-plus

edit images using text prompts or by incorporating elements from multiple source images, making it easy to add subjects or put several items into one scene.

T

infsh/tavily-extract

extracts clean, readable content, including text and images, from specified webpages, supporting batch processing for multiple urls.

hidream-i1

infsh/hidream-i1

an open-source tool for generating high-quality images in seconds.

array-switch

infsh/array-switch

allows you to choose between two different inputs based on a condition applied to an array of data.

extract-last-frame

infsh/extract-last-frame

save a specific frame from the end of a video as a static image file.

fast-whisper-large-v3

infsh/fast-whisper-large-v3

quickly converts audio files into written text or translates them to a different language.

thera

infsh/thera

upscales images to any size without blurriness or jagged edges, maintaining high detail through its unique neural heat field technology.

wan2-2-t2v-5b

infsh/wan2-2-t2v-5b

generates video content from text prompts.

wan2-2-t2v-5b-old

infsh/wan2-2-t2v-5b-old

generate stable and fluid high-resolution videos (720p, 24fps) from text prompts or by transforming an existing image into a video, supporting cinema-grade aesthetic controls.

dia-tts

infsh/dia-tts

generates realistic, conversational speech from text for lifelike dialogue.

media-merger

infsh/media-merger

merges multiple videos and images together using customized transitions.

R

infsh/rotate-glb

renders 3d models (glb/gltf) with controls for rotation, camera position, and lighting. it supports wireframe rendering and can output high-quality images at customized resolutions.

hunyuan-image-to-3d

infsh/hunyuan-image-to-3d

generates detailed, high-resolution 3d assets quickly from simple text descriptions or reference images.

gemma-3-12b-it

infsh/gemma-3-12b-it

developed by google, this open-source tool processes both text and images to answer questions, summarize content, perform reasoning tasks, and understand images.

flux-1-dev-upscaler

infsh/flux-1-dev-upscaler

increases the resolution of images and videos, enhancing clarity and detail by up to 4x.

array-element-switch

infsh/array-element-switch

selects one of two possible inputs based on a comparison check within an array.

xlam-2-32b-fc-r-i1

infsh/xlam-2-32b-fc-r-i1

a system capable of advanced multi-step reasoning and a strong understanding of language and context to create actionable plans.

latentsync-1-6

infsh/latentsync-1-6

generates precise and realistic lip synchronization for video and audio content, ensuring high-resolution and smooth temporal consistency.

wan2-2-i2v-a14b

infsh/wan2-2-i2v-a14b

generates high-resolution videos from single images, offering one of the fastest ways to create 720p, 24fps video clips from a still picture.

diffrythm

infsh/diffrythm

generates complete songs quickly and simply using advanced latent diffusion technology.

mask-image

infsh/mask-image

combines two images—a main image and a semi-transparent mask—to selectively hide or reveal parts of the main image, creating a partially transparent result.

falconsai-nsfw-detection

infsh/falconsai-nsfw-detection

detects nsfw content in images and videos using falconsai/nsfw_image_detection model. for videos, samples frames at configurable intervals.

phi-4-14b

infsh/phi-4-14b

a powerful tool developed by microsoft and trained on high-quality data to excel at complex tasks like advanced math, coding, and general problem-solving, offering detailed reasoning alongside solutions.

audio-x

infsh/audio-x

generates audio from any input using a unified framework.

video-audio-merger

infsh/video-audio-merger

merge video and audio files easily, with the flexibility to keep the original audio from the video.

z-image

infsh/z-image

a fast and high quality image generation model

hunyuan-image-to-3d-2

infsh/hunyuan-image-to-3d-2

create high-quality, photorealistic 3d models and textures from simple text prompts or 2d images.

text-to-file

infsh/text-to-file

creates a new document using the text and file name you specify.

devstral-small-2505

infsh/devstral-small-2505

an agent for software engineering tasks, created by mistral ai and all hands ai.

python-executor

infsh/python-executor

runs and executes python programming code in a safe environment.

sdxl-controlnet

infsh/sdxl-controlnet

generates images with high quality and precise control over the composition and structure.

search-assistant

infsh/search-assistant

helps users create and refine search queries, retrieve relevant results from various sources, and generate overviews or summaries of the information found.

E

infsh/exa-answer

provides direct, factual answers to your questions by analyzing and summarizing relevant information from web search results.

instant-character

infsh/instant-character

generates images featuring a character with a consistent visual style across different scenes or poses.

qwen-image-edit-plus

infsh/qwen-image-edit-plus

performs high-quality editing across multiple images.

mmaudio

infsh/mmaudio

transforms silent videos into immersive experiences by generating high-quality, perfectly synchronized sound effects and environmental audio based on the visual content.

fabric-1-0

falai/fabric-1-0

creates videos where an image appears to talk using advanced lip-sync technology.

cogview4-6b

infsh/cogview4-6b

generates high-quality images from text, capable of producing detailed visuals up to 2048x2048 resolution.

ltx-video

infsh/ltx-video

create high-quality, realistic, and customizable videos quickly, with capabilities for producing detailed, high-resolution content.

kokoro-tts

infsh/kokoro-tts

converts text into spoken audio.

boolean-switch

infsh/boolean-switch

selects one of two possible inputs based on whether a condition is true or false.

bytedance-uso

infsh/bytedance-uso

a unified image editor that allows users to generate images by combining any subject with any style efficiently, preserving identity and consistency.

wan2-1-t2v-effects

infsh/wan2-1-t2v-effects

generates high-quality videos from text descriptions with dynamic motion and supports both chinese and english language prompts at 720p resolution.

wan2-2-t2i-a14b

infsh/wan2-2-t2i-a14b

generates images from text descriptions.

pixverse-lipsync

falai/pixverse-lipsync

generates highly realistic lipsync animations from any audio input.

mistral-small-3-2-24b-it-2506

infsh/mistral-small-3-2-24b-it-2506

follows precise instructions, excels at function/tool calling, and can process both text and images for tasks like document understanding and content generation.

wan2-2-i2v-5b

infsh/wan2-2-i2v-5b

generates high-definition video from images or text prompts quickly, running efficiently even on standard graphics cards.

any-model

openrouter/any-model

a unified api that provides access to hundreds of ai providers and models through a single endpoint, automatically selecting the most cost-effective option and handling backups if needed.

intellect-3

openrouter/intellect-3

intellect 3

gemma-3-27b-it

infsh/gemma-3-27b-it

handles complex tasks like question answering, summarizing, and reasoning across both text and image inputs, with support for multiple languages.

flux-1-srpo-dev

infsh/flux-1-srpo-dev

creates high-quality images from text descriptions, specializing in photorealistic results.

flux-1-krea-dev

infsh/flux-1-krea-dev

a tool for generating images from text descriptions, available for non-commercial use.

wan2-2-i2i-a14b

infsh/wan2-2-i2i-a14b

creates videos from images and enhances video quality using a built-in upscaler.

E

infsh/exa-extract

retrieves and analyzes content from web pages using sophisticated technology to provide accurate insights.

claude-opus-45

openrouter/claude-opus-45

an extremely capable assistant designed for advanced coding tasks, known for its superior planning and ability to solve complex programming challenges.

gemini-3-pro-preview

openrouter/gemini-3-pro-preview

gemini 3 pro preview

nano-banana

falai/nano-banana

allows you to edit images by combining features and elements from multiple source images.

ovi-ti2v

infsh/ovi-ti2v

creates videos with synchronized audio from text descriptions or images, supporting both text-to-video and image-to-video generation with built-in speech.

qwen-image

infsh/qwen-image

generates and edits images from text descriptions, excelling at rendering complex text within the image.

lightx2v-wan-2-2-i2v-a14b

infsh/lightx2v-wan-2-2-i2v-a14b

converts a static image into a video quickly by using advanced optimization techniques.

qwen-image-edit

infsh/qwen-image-edit

advanced image editing that excels at rendering and manipulating text within images, allowing for precise changes to appearance and meaning.

higgs-audio

infsh/higgs-audio

generates speech from text with advanced, expressive audio quality.

gemma-3n-e4b-it

infsh/gemma-3n-e4b-it

a fast and versatile tool that can analyze and respond to information from text, images, and audio, designed to run efficiently on small or limited devices.

claude-sonnet-45

openrouter/claude-sonnet-45

excels at coding, building complex agents, and solving difficult problems, offering high intelligence for various tasks.

flux-2-dev

falai/flux-2-dev

flux.2 dev is an open-weight, guidance-distilled text-to-image model developed by black forest labs for non-commercial applications

omni-zero

infsh/omni-zero

creates stylized portraits instantly without needing specific training data.

gemini-2-5-flash-image

falai/gemini-2-5-flash-image

gemini 2.5 flash image (nano banana) is a model that allows you to edit an image with multiple provided source images.

lightning-wan-2-2-i2v-a14b-offload

infsh/lightning-wan-2-2-i2v-a14b-offload

image to video with wan2.2 using the lightning lora (non-fused) for fast generation

numerical-switch

infsh/numerical-switch

selects one of two inputs based on a condition involving numerical comparison.

not enough? create new apps fast. templates + coding agents make it insanely extensible.

create your own apps

start from templates. add code, packages, docs. deploy in minutes.

$ infsh app init
my-app/
inference.py
requirements.txt
$ infsh app deploy

schemas become tool parameters automatically. your app shows up in the grid and can be used by agents and workflows.

create workflows

build a graph of apps. deploy as a single callable app.

workflow builder

drag and drop to build the graph. map io to connect steps. deploy as an app.

view all appsexplore what's available
create your own appread the docs & start building

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.