Manage private inference engines that run workers on your hardware. Use these endpoints to list engines, inspect status, and perform safe restarts without dropping in-flight tasks.
Requires API key scopes engines:read (list/get) and engines:write (drain, update, stop, restart).
→ Workers · Installing the engine · Configuration
List engines
GET /engines or POST /engines/list
Cursor-paginated list of engines for your team.
1curl -X POST https://api.inference.sh/engines/list \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2" \4 -H "Content-Type: application/json" \5 -d '{"limit": 20}'Get engine
GET /engines/{id}
Returns an engine DTO including status (running, pending, draining, stopping, stopped).
1curl "https://api.inference.sh/engines/eng_abc123" \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2"Drain engine
POST /engines/{id}/drain
Gracefully prepare an engine for shutdown:
- Marks the engine as draining in the API.
- Sends a drain signal over the engine WebSocket.
- The engine stops accepting new tasks and waits for in-flight work to finish.
- When all workers are idle, the engine shuts down.
The engine must be connected (online WebSocket). If it is offline, the API returns an error.
Use drain before maintenance or when you want a clean shutdown without cutting off running tasks.
1curl -X POST "https://api.inference.sh/engines/eng_abc123/drain" \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2"While draining, new tasks routed to that engine are rejected with reason engine_draining.
Update engine binary
POST /engines/{id}/update
Triggers a safe engine update on a connected private engine:
- The API sends an update signal over the engine WebSocket.
- The engine checks
https://dist.inference.sh/engine/manifest.jsonfor a newer binary (same OS/arch). - When a newer version exists, it downloads the binary, drains in-flight tasks, and restarts the process (re-exec) after workers are idle.
This is the API equivalent of the workspace Update action on Engines or running inference-engine update on the host. Tasks already running complete before the restart.
1curl -X POST "https://api.inference.sh/engines/eng_abc123/update" \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2"The engine must be online. On hosts you manage directly, you can also update with the engine CLI — see Installing the engine.
Extend cloud engine duration
POST /engines/{id}/extend
Adds runtime to a cloud-provisioned engine (Shadeform instance) before its auto-delete threshold. Requires engines:write. Your team balance is debited at hourly_price × additional_hours.
Request
| Field | Type | Required | Description |
|---|---|---|---|
additional_hours | number | Yes | Hours to add (must be > 0) |
The engine must have an active instance with auto_delete.date_threshold and a positive hourly_price synced from the provider.
1curl -X POST "https://api.inference.sh/engines/eng_abc123/extend" \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2" \4 -H "Content-Type: application/json" \5 -d '{"additional_hours": 3}'Returns the updated engine DTO with a new instance auto_delete threshold. Errors use 400 with code extend_failed when the instance cannot be extended (for example, missing threshold or price).
Self-hosted engines on your own hardware do not use this endpoint — use Update engine binary or restart the service on the host.
Stop engine
POST /engines/{id}/stop
Stops a managed engine instance. If the engine is connected, it receives a graceful stop signal, finishes in-flight tasks, then terminates (including cloud instance teardown when applicable). If the engine is offline, the API finalizes the stop immediately.
Restart engine
POST /engines/{id}/restart
Restarts the underlying cloud instance (Shadeform) for engines provisioned through inference.sh. For self-hosted engines on your own hardware, use update or restart the inference-engine service on the host.
JavaScript SDK
1import { inference } from '@inferencesh/sdk';23const client = inference({ apiKey: 'inf_your_key' });45await client.engines.drain('eng_abc123');6await client.engines.updateBinary('eng_abc123'); // POST /engines/{id}/update7await client.engines.stop('eng_abc123');8await client.engines.restart('eng_abc123');9// POST /engines/{id}/extend — pass { additional_hours } in the request body (REST or HttpClient)client.engines also exposes list, get, create, update, delete, and stream.
Related
| Guide | Topic |
|---|---|
| Workers | Cloud vs private workers |
| Installing the engine | Host setup and manual upgrades |
| Using private workers | Routing tasks with infra: "private" |
| Instance types | GPU catalog for cloud-provisioned engines |