one api for everything.
ai models, video rendering, email, search, project management. 250+ tools. call anything, compose everything.
npm i -g @inference/beltalso on inference.sh
tools
250+ built-in, connected, and composed. AI models, video rendering, email, search. one api.
skills
the registry for AI agent skills. versioned, secure, evolving. works in every runtime.
agents
ship agents that run in production. durable execution, human-in-the-loop, 250+ tools.
ui
AI-native react components. chat, generative UI, tool approvals. 30+ widgets.
teams
coming soonyour team's AI workspace. shared memory, automations, self-hostable.
why we built this
we've heard it all before
real quotes from developers hitting the same walls.
"The agent framework is not the moat. Prompt engineering is not the moat. The base LLM is not the moat. The specialized tools that encode domain knowledge—are the moat."
"I spent 6 hours debugging a workflow that had zero error logs. When something breaks at 2 AM, I don't want to trace through 47 nodes. I want to see exactly what payload caused the issue."
"I felt like a 'button person' in my IDE. The agent works in quanta—cut off by time every 2 minutes. Long tasks require pipeline thinking, not chat sessions."
"Our multi-step agent produced great results but took 45+ seconds. Users thought it crashed. If they see the internal monologue, they wait. If they see a spinner, they leave."
"Spent 10 hours deploying agents on EC2... $13/mo per agent. Switched to serverless: 10 cents. Why is this so hard?"
"Systems record that a ticket was escalated, but not why it happened. Without that reasoning, agents treat every edge case as a brand new problem."
frequently asked questions
what is inference.sh?
inference.sh is a platform with 250+ tools, including AI models, dev tools, and integrations, callable through one API. it also includes a skill registry for AI agents, a production agent runtime, AI-native React components, and a team workspace.
how is inference.sh different from Replicate?
Replicate has AI models. inference.sh has AI models plus video rendering, email, search, project management, and MCP servers, all composable through one API. plus BYOK (bring your own keys) to route through Fal, Google, or your own GPUs.
do I need to build agents to use inference.sh?
no. call any tool with a single HTTP request. agents, skills, and teams are separate products on the same platform. use what you need.
what is BYOK?
bring your own keys. route model runs through Fal, Google, or your own GPUs. you're not locked in to any single provider.
we use cookies
we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.
by clicking "accept", you agree to our use of cookies.
learn more.