Inference Logoinference.sh

Engines

Manage private inference engines that run workers on your hardware. Use these endpoints to list engines, inspect status, and perform safe restarts without dropping in-flight tasks.

Requires API key scopes engines:read (list/get) and engines:write (drain, update, stop, restart).

Workers · Installing the engine · Configuration


List engines

GET /engines or POST /engines/list

Cursor-paginated list of engines for your team.

bash
1curl -X POST https://api.inference.sh/engines/list \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2" \4  -H "Content-Type: application/json" \5  -d '{"limit": 20}'

Get engine

GET /engines/{id}

Returns an engine DTO including status (running, pending, draining, stopping, stopped).

bash
1curl "https://api.inference.sh/engines/eng_abc123" \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2"

Drain engine

POST /engines/{id}/drain

Gracefully prepare an engine for shutdown:

  1. Marks the engine as draining in the API.
  2. Sends a drain signal over the engine WebSocket.
  3. The engine stops accepting new tasks and waits for in-flight work to finish.
  4. When all workers are idle, the engine shuts down.

The engine must be connected (online WebSocket). If it is offline, the API returns an error.

Use drain before maintenance or when you want a clean shutdown without cutting off running tasks.

bash
1curl -X POST "https://api.inference.sh/engines/eng_abc123/drain" \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2"

While draining, new tasks routed to that engine are rejected with reason engine_draining.


Update engine binary

POST /engines/{id}/update

Triggers a safe engine update on a connected private engine:

  1. The API sends an update signal over the engine WebSocket.
  2. The engine checks https://dist.inference.sh/engine/manifest.json for a newer binary (same OS/arch).
  3. When a newer version exists, it downloads the binary, drains in-flight tasks, and restarts the process (re-exec) after workers are idle.

This is the API equivalent of the workspace Update action on Engines or running inference-engine update on the host. Tasks already running complete before the restart.

bash
1curl -X POST "https://api.inference.sh/engines/eng_abc123/update" \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2"

The engine must be online. On hosts you manage directly, you can also update with the engine CLI — see Installing the engine.


Extend cloud engine duration

POST /engines/{id}/extend

Adds runtime to a cloud-provisioned engine (Shadeform instance) before its auto-delete threshold. Requires engines:write. Your team balance is debited at hourly_price × additional_hours.

Request

FieldTypeRequiredDescription
additional_hoursnumberYesHours to add (must be > 0)

The engine must have an active instance with auto_delete.date_threshold and a positive hourly_price synced from the provider.

bash
1curl -X POST "https://api.inference.sh/engines/eng_abc123/extend" \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2" \4  -H "Content-Type: application/json" \5  -d '{"additional_hours": 3}'

Returns the updated engine DTO with a new instance auto_delete threshold. Errors use 400 with code extend_failed when the instance cannot be extended (for example, missing threshold or price).

Self-hosted engines on your own hardware do not use this endpoint — use Update engine binary or restart the service on the host.


Stop engine

POST /engines/{id}/stop

Stops a managed engine instance. If the engine is connected, it receives a graceful stop signal, finishes in-flight tasks, then terminates (including cloud instance teardown when applicable). If the engine is offline, the API finalizes the stop immediately.


Restart engine

POST /engines/{id}/restart

Restarts the underlying cloud instance (Shadeform) for engines provisioned through inference.sh. For self-hosted engines on your own hardware, use update or restart the inference-engine service on the host.


JavaScript SDK

typescript
1import { inference } from '@inferencesh/sdk';23const client = inference({ apiKey: 'inf_your_key' });45await client.engines.drain('eng_abc123');6await client.engines.updateBinary('eng_abc123'); // POST /engines/{id}/update7await client.engines.stop('eng_abc123');8await client.engines.restart('eng_abc123');9// POST /engines/{id}/extend — pass { additional_hours } in the request body (REST or HttpClient)

client.engines also exposes list, get, create, update, delete, and stream.


GuideTopic
WorkersCloud vs private workers
Installing the engineHost setup and manual upgrades
Using private workersRouting tasks with infra: "private"
Instance typesGPU catalog for cloud-provisioned engines

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.