
documentation
everything you need to build production AI agents
all pages
What is inference.sh?
inference.sh is an AI workspace where agents can actually do things.
Workspace Tour
A quick look at what you'll find in the workspace.
Your First Agent
Create an agent in 5 minutes.
Concepts Overview
Everything in inference.sh connects to help you get work done with AI.
Apps
Apps are tools that do one thing well.
Tasks
A task is what happens when you run an app.
Flows
Flows chain multiple apps together.
Agents
Agents are AI assistants that use tools.
Workers
Workers are computers that run your tasks.
Agents Overview
Create AI assistants that plan, use tools, and get things done.
Creating an Agent
Step-by-step guide to creating an agent.
System Prompts
The system prompt defines your agent's personality and behavior.
Adding Tools
Tools extend what your agent can do. There are three types of tools.
Sub-Agents
Agents can delegate to other agents. Sub-agents are just another type of tool.
Chatting
How to interact with your agent.
Webhooks
Connect your agent to external services via webhooks.
Apps Overview
Tools that do one thing well.
Browsing the Grid
Find apps to run or add to agents.
Running an App
How to run an app from the workspace.
Variants
Different configurations of the same app.
Flows Overview
Visual workflows that chain apps together.
Creating a Flow
Build a multi-step workflow visually.
Connecting Nodes
Wire data between apps in your flow.
Deploying as App
Turn your flow into a reusable app.
Extending Apps
Build and deploy your own tools.
CLI Setup
Install the command-line tool for creating apps.
Creating an App
Scaffold a new app with the CLI.
App Code
The inference.py file is your app's logic.
Configuration
The inf.yml file defines app settings and resource requirements.
Deploying
Push your app to inference.sh.
API & SDK Overview
Programmatic access for developers.
Authentication
How to authenticate with the API.
Running Apps
Call apps programmatically.
Streaming
Watch task progress in real-time.
Files
Upload and download files with the API.
Why Private Workers?
Reasons to run on your own infrastructure.
Installing the Engine
Set up private workers on your hardware.
Configuration
Configure workers on your engine.
Using Private Workers
Run tasks on your own hardware.
Image Generation
Generate images with AI models.
Audio Transcription
Convert speech to text with Whisper.
Content Pipeline
Build a flow that creates social media content from a prompt.
Data Processing
Process and analyze data with AI.
Multi-Agent System
Build agents that work together.
Introduction
> inference.sh is an AI workspace where agents can actually do things. Put agents in charge of the busywork—they plan steps, pick the right tools or flows, and even generate small UI widgets when th...
Secrets Overview
Secrets and integrations for connecting external services.
Environment Variables
Store encrypted API keys and credentials.
Integrations Overview
Managed connections to external services.
Google Service Account
Connect Google Sheets, Docs, and Drive using a service account.
Google OAuth
Connect your personal Google account for Gmail, Calendar, and Drive.
X.com Integration
Connect X.com (formerly Twitter) to your agents.
Slack Integration
Connect Slack to your agents for messaging, channels, and events.
Discord Integration
Connect Discord to your agents for messaging, events, and bot interactions.
X.com Integration
Build apps that interact with X.com (Twitter).
Python SDK
Python SDK for inference.sh agents and apps.
JavaScript SDK
JavaScript/TypeScript SDK for inference.sh agents and apps.
Tool Builder
Fluent API for defining agent tools.
Google Cloud Platform
Connect your own GCP project to use Vertex AI, BigQuery, Cloud Storage, and other GCP services with your own quotas and billing.
What is a Runtime?
inference.sh is an agent runtime; the infrastructure layer that executes your agent code with built-in solutions for the hard operational problems.
Durable Execution
durable execution means your agent's state checkpoints after each step. if a connection drops, a process restarts, or a tool times out, execution resumes from the last checkpoint instead of starting o...
Observability
every agent running on inference.sh is automatically traced. no configuration, no instrumentation code, no separate products.
Human-in-the-Loop
human-in-the-loop lets you add approval gates to agent actions with one flag. the agent pauses, shows what it wants to do, and waits for confirmation before executing.
Tool Orchestration
inference.sh provides 150+ pre-built integrations as tools for your agents. oauth flows, token refresh, and credential management are handled by the platform.
Setup Parameters
Setup parameters configure an app's initialization steps. Modifying these parameters triggers a full restart (re-running setup()).
Multi-Function App
Apps can define multiple entry points to support different modes of operation within the same container context. This reduces cold starts and resource duplication.
Graceful Cancellation
Apps typically process requests until completion. However, users may cancel long-running tasks. Implementing graceful cancellation ensures resources are released immediately and partial results are ha...
Concepts Overview
Everything in inference.sh connects to help you get work done with AI.
Extending Apps
Build and deploy your own tools.
API & SDK Overview
Programmatic access for developers.
Secrets
Secrets allow your app to securely access API keys and sensitive values. Users provide their own secrets which are encrypted and injected at runtime.
Integrations
Integrations allow your app to access external services (Google Sheets, Drive, etc.) on behalf of users through OAuth.
Output Metadata
OutputMeta enables usage-based pricing by reporting what your app processes and generates.
Best Practices
Performance optimization and coding patterns for inference.sh apps.
Troubleshooting
Common issues and solutions for inference.sh apps.
Widgets Overview
Learn how to design widgets in your agent experience.
Widget Schema
Complete type definitions for the widget system.
Widget Actions
Trigger actions on the backend from user interactions in your widgets.
Card
A bounded container for widgets. The Card is the primary container for all widget content.
Text
Displays plain text with optional styling variants.
Badge
A small label for status or metadata.
Row
Arranges children horizontally with configurable gap.
Col
Arranges children vertically with configurable gap.
Button
A flexible action button that triggers widget actions.
Input
Text input field for collecting user input.
Select
Dropdown single-select input for choosing from predefined options.
Checkbox
Binary selection control for boolean values.
Image
Displays an image with optional alt text.
Markdown
Renders markdown-formatted text with full styling support.
Box
Flexible container for layout and surface styling with full control.
Spacer
Flexible space to separate content within a layout.
Divider
Separate content with a thin line.
Form
Layout container optimized for form controls and submission.
Title
Section headings with scalable sizes and weights.
Caption
Supplemental text for descriptions, hints, or metadata.
Label
Accessible label for a form field.
Textarea
Multi-line text input control for longer form content.
RadioGroup
Choose a single option from a set of mutually exclusive options.
DatePicker
Select a date from a calendar popover.
Icon
Visual glyphs for actions and status indicators.
Chart
Render simple bar, line, and area charts from tabular data.
we use cookies
we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.
by clicking "accept", you agree to our use of cookies.
learn more.