Inference Logoinference.sh

realtime streaming

Stream Agent Execution to Your Frontend

Subscribe to chat sessions and receive live updates as your agent thinks, calls tools, and generates responses.

why streaming matters

AI agents take time. They think, they call tools, they process data. Without streaming, users stare at a spinner wondering if anything is happening.

With inference.sh, every update streams to your frontend in real-time. Users see the agent working—what tools it's calling, what it's thinking, what it's producing.

the agent-sdk

Our React SDK handles streaming automatically. Wrap your chat UI in a provider and use hooks to access state and actions.

import { AgentChatProvider, useAgentChat, useAgentActions } from '@inference/agent-sdk';

function MyChat() {
  return (
    <AgentChatProvider agentConfig={{ core_app_ref: 'infsh/claude-sonnet-4@abc123' }}>
      <ChatUI />
    </AgentChatProvider>
  );
}

function ChatUI() {
  const { messages, status, isGenerating } = useAgentChat();
  const { sendMessage, stopGeneration } = useAgentActions();

  // messages update in real-time as the agent responds
  // status shows: 'idle' | 'connecting' | 'streaming' | 'error'
}

pre-built chat component

Don't want to build your own UI? Use the pre-composed AgentChat component.

import { AgentChat } from '@inference/agent-sdk';

<AgentChat
  agentConfig={{
    core_app_ref: 'infsh/claude-opus-45@abc123',
    name: 'My Assistant',
  }}
/>

what streams

Every part of the agent's execution is streamed:

  • messages. User messages and assistant responses as they're generated.
  • tool invocations. See which tools the agent is calling and their results.
  • status updates. Know when the agent is thinking, waiting for approval, or done.
  • llm tokens. Stream text as it's generated, not just when it's complete.

client tools

Define tools that execute in the browser. The SDK automatically handles invocations and submits results back to the agent.

import { tool, string } from '@inference/agent-sdk';

const scanUI = tool('scan_ui', 'Scan the current page')
  .input({ selector: string('CSS selector to scan') })
  .handler(async ({ selector }) => {
    const element = document.querySelector(selector);
    return JSON.stringify({ found: !!element });
  });

<AgentChat agentConfig={{ tools: [scanUI] }} />

under the hood

Streaming uses Server-Sent Events (SSE). A single connection per chat session receives typed events for both chat state and message updates. Auto-reconnect handles network interruptions.

For non-React environments, you can use the StreamManager class directly or connect to the SSE endpoint with any HTTP client.

get started

Install the SDK and start building streaming agent interfaces.

read the sdk docs →

what you get

the runtime layer

you could build this. but do you want to?

01

durable execution

event-driven, not long-running. if a tool fails, it doesn't crash your agent loop. state persists across invocations.

02

tool orchestration

150+ apps as tools via one API. structured execution with approvals when needed. full visibility into what ran.

03

observability

real-time streaming and logs for every action. see exactly what your agent is doing.

04

pay-per-execution

no idle costs while tools run or waiting for results. you're not paying to keep a process alive.

plug any model, swap providers without changing code

openai
Anthropicanthropic
google
meta
mistral
deepseek
+ 500 more

ready to ship?

start with the hosted platform. deploy your own when you're ready.

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.