Inference Logoinference.sh

MCP on inference.sh

Model Context Protocol (MCP) has become the standard way for AI tools to talk to external services. Instead of each tool building its own integration layer, MCP provides a common protocol - a shared language that clients and servers use to exchange tool definitions, execute calls, and return results. If you have used Claude Code, Cursor, Cline, or Windsurf, you have already used MCP whether you realized it or not.

This guide covers how inference.sh works with MCP from both directions. First, inference.sh acts as an MCP server - any compatible client can connect and get access to over 250 tools. Second, inference.sh connects to other MCP servers - services like Linear, Notion, and Slack become available through a single interface. Both directions matter, and together they make MCP practical for teams rather than just individual developers.

What MCP Actually Does

MCP defines a protocol for tool discovery and execution between a client and a server. The client asks "what tools do you have?" and the server responds with structured definitions - tool names, parameter schemas, descriptions. The client can then call any of those tools by sending a request with the tool name and arguments. The server executes the call and returns the result.

This is straightforward, but the implications are significant. Any client that speaks MCP can use any server that speaks MCP. A developer using Claude Code gets access to the same tools as a developer using Cursor. A new tool added to a server becomes immediately available to every connected client. The protocol creates a layer of interoperability that did not exist before.

The catch is setup. Each MCP connection requires configuration - server URLs, authentication credentials, transport settings. For a single developer working on one project, this is manageable. For a team working across multiple projects with multiple MCP servers, the configuration overhead adds up fast. Every developer configures every connection independently. Credentials get stored in local config files. When a server changes its configuration, everyone updates manually.

This is the problem that inference.sh addresses on both sides of the MCP equation.

inference.sh as an MCP Server

When you connect an MCP client to inference.sh, you get access to over 250 tools through a single connection. Image generation, video creation, language models, web search, code execution, 3D generation, audio synthesis - all available as MCP tools with standard parameter schemas.

Any MCP-compatible client can connect. Claude Code, Cursor, Cline, Windsurf - if it speaks MCP, it works. You configure one connection to inference.sh instead of configuring dozens of connections to individual services.

The tools available through inference.sh as an MCP server fall into three categories:

Built-in apps are the 250+ tools that inference.sh provides directly. These cover a wide range of AI capabilities - image and video generation, language models, search engines, audio tools, and more. Each app exposes a clean tool interface with typed parameters and structured output.

Connected MCP servers are third-party services that inference.sh connects to on your behalf. When you connect to Linear through inference.sh, those Linear tools become available through your inference.sh MCP connection. More on this in the next section.

Composed flows are custom tool chains you build by combining other tools. A flow that searches the web, summarizes results, and posts to Slack appears as a single tool to any MCP client.

All three sources merge into one tool catalog. An MCP client connected to inference.sh sees all of them as standard MCP tools. The client does not need to know or care where each tool lives internally.

This matters for teams. Instead of each developer maintaining a collection of MCP server configurations in their local environment, the team connects to inference.sh once. Tool availability, authentication, and configuration are managed centrally. A new team member gets access to every tool by configuring a single MCP connection.

Connecting to MCP Servers from inference.sh

The other direction is just as useful. inference.sh can connect to external MCP servers, making their tools available to your agents and workflows.

The belt CLI makes this straightforward. Start by listing available servers:

code
1belt mcp list

This shows the MCP servers that inference.sh can connect to - services like Linear, Notion, Slack, and others. Each server exposes its own set of tools through the MCP protocol.

Connecting to a server is one command:

code
1belt mcp connect linear

Once connected, you can see what tools that server provides:

code
1belt mcp tools linear

For Linear, this returns around 35 tools - get_issue, save_issue, list_comments, save_document, list_projects, and more. Each tool has a typed schema describing its parameters and return values.

You can also search across all available MCP servers to find specific capabilities:

code
1belt mcp search "project management"

Executing MCP Tools

Running a tool on a connected server uses belt mcp run:

code
1belt mcp run linear get_issue --issue-id "INF-123"

This calls the get_issue tool on the Linear MCP server, passing the issue ID as a parameter. The result comes back as structured data - the issue title, description, status, assignee, labels, and other fields.

You can list issues with filters:

code
1belt mcp run linear list_issues --team-id "INF" --status "In Progress"

Or create and update issues:

code
1belt mcp run linear save_issue --title "Fix login timeout" --team-id "INF" --priority 2

The same pattern works for every connected MCP server. The tools differ, but the interface is consistent.

OAuth Handled for You

Many MCP servers require OAuth authentication - Linear, Notion, Slack, and others need user authorization before their APIs can be accessed. Setting up OAuth flows is tedious. You need client IDs, client secrets, redirect URIs, token storage, and refresh logic.

inference.sh handles this automatically. When you connect to an MCP server that requires OAuth, inference.sh manages the authorization flow. You authenticate once. inference.sh stores and refreshes tokens. Every tool call on that server uses valid credentials without you thinking about token management.

This is particularly valuable for teams. OAuth credentials are managed at the platform level, not scattered across individual developer machines. When a token needs refreshing, it happens automatically. When a team member joins, they authorize once and get access to every connected server.

Agents Using MCP Tools

The tools available through connected MCP servers are not limited to CLI usage. Agents running on inference.sh can call MCP tools just like they call built-in tools. This opens up significant automation possibilities.

An agent that monitors GitHub pull requests can create corresponding Linear issues when PRs are opened. An agent that processes customer feedback can update Notion databases. An agent that handles incident response can post to Slack channels and create tracking issues simultaneously.

From the agent's perspective, there is no difference between calling a built-in tool and calling an MCP tool. Both appear as tools with typed parameters and structured results. The agent does not need special logic for MCP - it just calls tools.

This means you can build workflows that span multiple services without writing integration code. The MCP protocol handles the communication. inference.sh handles the authentication. Your agent handles the logic.

Example: Cross-Service Automation

Consider a workflow that runs when a task is completed. The agent needs to:

  1. Get the task details from Linear
  2. Find related documentation in Notion
  3. Post a summary to a Slack channel
  4. Update the Linear issue with a link to the Slack message

Without MCP, this requires three separate API integrations, three sets of credentials, and custom code for each service. With MCP through inference.sh, the agent calls four tools in sequence. The tools are get_issue on Linear, search on Notion, send_message on Slack, and save_comment on Linear.

The complexity is in the business logic, not the integration plumbing.

Three Sources of Tools

inference.sh gives agents and developers access to tools from three distinct sources. Understanding these sources helps you think about what is available and where to look for specific capabilities.

Built-in Apps

Over 250 apps are available directly on inference.sh. These are first-party integrations that the platform maintains. They cover AI model access (language models, image generation, video creation, audio synthesis), utility functions (web search, code execution, file processing), and specialized tools (3D generation, avatar creation, music generation).

You can browse the full catalog at inference.sh tools.

Built-in apps are always available. They do not require additional setup or authentication beyond your inference.sh account.

Connected MCP Servers

Third-party services connected through the MCP protocol. These extend the tool catalog with capabilities from external platforms - project management from Linear, documentation from Notion, communication from Slack, and others.

MCP servers require a one-time connection step and potentially OAuth authorization. After that, their tools are available alongside built-in apps.

The list of available MCP servers grows as more services adopt the protocol. Because MCP is a standard, any conforming server can be connected.

Composed Flows

Custom tools built by combining other tools into sequences or parallel execution patterns. A flow appears as a single tool but internally calls multiple tools across multiple sources.

Flows let you create domain-specific tools that match your team's workflows. A "deploy and notify" flow might run a deployment tool, verify the result, and post to Slack - all as a single tool call.

Why Central Management Matters

The default approach to MCP is local configuration. Each developer adds MCP server connections to their editor or CLI config. Each connection requires credentials. Each project might need different servers.

This works for individual developers. It breaks down for teams.

Consistency becomes a problem. Different developers have different MCP servers configured with different versions. Tool behavior varies across the team. Debugging is harder because environments differ.

Onboarding becomes a problem. New team members need to configure every MCP connection, obtain credentials, and verify everything works. This can take hours for a complex setup.

Security becomes a problem. OAuth tokens and API keys live on individual machines. Revoking access when someone leaves the team means tracking down every credential on every machine.

Maintenance becomes a problem. When an MCP server updates its protocol or changes its authentication, every developer updates independently.

Central management through inference.sh solves these problems. MCP connections are configured once at the platform level. Credentials are stored securely and managed centrally. New team members get access immediately. Updates happen in one place.

This is not a theoretical benefit. Teams running more than a few MCP servers feel the pain of distributed configuration quickly. Central management turns MCP from a developer convenience into team infrastructure.

Getting Started

Setting up MCP on inference.sh takes a few minutes.

First, check what MCP servers are available:

code
1belt mcp list

Connect to the servers your team uses:

code
1belt mcp connect linear2belt mcp connect slack-api3belt mcp connect notion-mcp-beta

Verify the tools available on each connected server:

code
1belt mcp tools linear2belt mcp tools slack-api

Run a quick test to confirm everything works:

code
1belt mcp run linear list_teams

From here, you can use MCP tools through the CLI, through agents, or through any MCP client connected to inference.sh as a server.

For MCP client configuration, point your client (Claude Code, Cursor, Cline, Windsurf) to inference.sh as an MCP server. The specifics depend on your client - each has its own configuration format for adding MCP servers - but the connection target is always inference.sh.

FAQ

Can I use inference.sh with MCP clients other than Claude Code?

Yes. Any client that implements the MCP protocol can connect to inference.sh as a server. This includes Claude Code, Cursor, Cline, Windsurf, and any other MCP-compatible tool. The protocol is standardized, so client choice does not affect what tools are available or how they behave.

What happens when an MCP server requires authentication?

inference.sh handles OAuth and other authentication flows for you. When you run belt mcp connect for a server that needs authorization, inference.sh walks you through the auth flow once. After that, tokens are stored and refreshed automatically. You do not need to manage credentials manually or worry about token expiration.

Can my agents use MCP tools alongside built-in inference.sh tools?

Yes. Agents running on inference.sh see all tools - built-in apps, connected MCP servers, and composed flows - as a unified tool catalog. An agent can call an image generation tool (built-in), create a Linear issue (MCP), and post to Slack (MCP) in the same execution without any special configuration. The tools come from different sources, but the interface is identical.

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.