one api for everything.

250+ serverless tools. ai models, dev tools, integrations. call anything, compose everything.

terminal
$curl https://api.inference.sh/v1/run \
-H "Authorization: Bearer $KEY" \
-d '{"app":"flux-schnell","input":{"prompt":"a cat on mars"}}'
→ image generated in 3.2s
flux-devimage
seedance-2-t2vvideo
claude-opus-45chat
remotion-rendervideo
gpt-image-2image
veo-3-1video
elevenlabs/ttsaudio
tavily/searchtext
exa/searchtext
x/post-createsocial
seedream-4-5image
grok-imagine-imageimage
omnihuman-1-5video
kokoro-ttsaudio
shellutilities
agent-browserother
gemini-3-pro-imageimage
wan-2-7-t2vvideo
rodin-3d-generator3d
python-executortext
dia-ttsaudio
reveimage
phota/generateimage
minimax-m-25chat
flux-devimage
seedance-2-t2vvideo
claude-opus-45chat
remotion-rendervideo
gpt-image-2image
veo-3-1video
elevenlabs/ttsaudio
tavily/searchtext
exa/searchtext
x/post-createsocial
seedream-4-5image
grok-imagine-imageimage
omnihuman-1-5video
kokoro-ttsaudio
shellutilities
agent-browserother
gemini-3-pro-imageimage
wan-2-7-t2vvideo
rodin-3d-generator3d
python-executortext
dia-ttsaudio
reveimage
phota/generateimage
minimax-m-25chat
minimax-m-25chat
phota/generateimage
reveimage
dia-ttsaudio
python-executortext
rodin-3d-generator3d
wan-2-7-t2vvideo
gemini-3-pro-imageimage
agent-browserother
shellutilities
kokoro-ttsaudio
omnihuman-1-5video
grok-imagine-imageimage
seedream-4-5image
x/post-createsocial
exa/searchtext
tavily/searchtext
elevenlabs/ttsaudio
veo-3-1video
gpt-image-2image
remotion-rendervideo
claude-opus-45chat
seedance-2-t2vvideo
flux-devimage
minimax-m-25chat
phota/generateimage
reveimage
dia-ttsaudio
python-executortext
rodin-3d-generator3d
wan-2-7-t2vvideo
gemini-3-pro-imageimage
agent-browserother
shellutilities
kokoro-ttsaudio
omnihuman-1-5video
grok-imagine-imageimage
seedream-4-5image
x/post-createsocial
exa/searchtext
tavily/searchtext
elevenlabs/ttsaudio
veo-3-1video
gpt-image-2image
remotion-rendervideo
claude-opus-45chat
seedance-2-t2vvideo
flux-devimage
b
belt
the CLI for inference.sh. run tools, manage skills, connect MCP servers, all from your terminal.

why we built this

we've heard it all before

real quotes from developers hitting the same walls.

the moat

"The agent framework is not the moat. Prompt engineering is not the moat. The base LLM is not the moat. The specialized tools that encode domain knowledge—are the moat."

observability

"I spent 6 hours debugging a workflow that had zero error logs. When something breaks at 2 AM, I don't want to trace through 47 nodes. I want to see exactly what payload caused the issue."

durable execution

"I felt like a 'button person' in my IDE. The agent works in quanta—cut off by time every 2 minutes. Long tasks require pipeline thinking, not chat sessions."

real-time streaming

"Our multi-step agent produced great results but took 45+ seconds. Users thought it crashed. If they see the internal monologue, they wait. If they see a spinner, they leave."

pay-per-execution

"Spent 10 hours deploying agents on EC2... $13/mo per agent. Switched to serverless: 10 cents. Why is this so hard?"

decision context

"Systems record that a ticket was escalated, but not why it happened. Without that reasoning, agents treat every edge case as a brand new problem."

frequently asked questions

what is inference.sh?

one api for everything. 250+ serverless tools: AI models, dev tools, and integrations. run them directly, compose them into workflows, or use them to power agents. also includes a skill registry, react components, and a team workspace.

how is inference.sh different from Replicate?

Replicate has AI models. inference.sh has AI models plus video rendering, email, search, project management, and MCP servers, all composable through one API. plus BYOK (bring your own keys) to route through Fal, Google, or your own GPUs.

can I use inference.sh without building agents?

yes. call any tool with a single HTTP request or compose them into workflows. no agents required. agents, skills, and teams are separate products on the same platform.

what is BYOK?

bring your own keys. route model runs through Fal, Google, or your own GPUs. you're not locked in to any single provider.

ready to ship?

start with the hosted platform. deploy your own when you're ready.

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.