tools

one api for everything.

ai models, video rendering, email, search, project management. 250+ tools. built-in, connected, or composed.

three sources of tools

built-in
250+ apps
connected
MCP servers
composed
flows → tools
one api
same interface for everything
image
FLUX
SDXL
Recraft
Midjourney
video
Seedance
Veo
Remotion
text
Claude
Gemini
GPT
audio
Whisper
TTS
Bark
communication
Gmail
Slack
Twitter
dev tools
Linear
GitHub
Notion
search
Tavily
Exa
code
Sandbox
Browser
javascript
import Inference from 'inference'

const client = new Inference()

const result = await client.run('flux-schnell', {
  prompt: 'a cat on mars'
})
output
🐱

generated in 3.2s

sdks: javascript · python · go · cli

one api

same interface for every tool. built-in or connected, same shape: input in, output out. no per-provider SDKs.

byok

bring your own keys. route model runs through fal, google, or your own GPUs. you're not locked in.

durable

every tool call retries on failure, persists state, and tracks execution. built in, not bolted on.

how we compare

replicatefalinference.sh
AI models
non-AI tools
MCP connections
flows → tools
BYOK
durable execution
b
belt
run any tool from your terminal. belt app run flux-schnell. one command, same api.

frequently asked questions

what is inference.sh?

a platform with 250+ tools, including AI models, dev tools, and integrations, callable through one API. connect more via MCP servers, or compose tools into new tools with flows.

how is inference.sh different from Replicate?

Replicate has AI models. inference.sh has AI models plus video rendering, email, search, project management, and MCP servers. all composable. plus BYOK: bring your own keys to route through Fal, Google, or your own GPUs.

what is MCP?

Model Context Protocol, an open standard for connecting tools to AI agents. browse MCP servers on inference.sh, or use inference.sh as an MCP server from Claude Code or Cursor.

do I need to build agents to use tools?

no. call any tool with a single HTTP request. agents and skills are separate products on the same platform. use what you need.

what is BYOK?

bring your own keys. route model runs through Fal, Google, or your own GPUs. you're not locked in to any single compute provider.

ready to ship?

start with the hosted platform. deploy your own when you're ready.

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.