Inference Logoinference.sh

Managing Engines

Drain, update, and restart engines without interrupting in-flight tasks.


When to use drain vs update

ActionUse when
DrainYou want the engine to stop accepting new work and shut down after current tasks finish (maintenance window, decommissioning a machine).
UpdateYou want the engine binary upgraded to the latest release. The engine drains first, swaps the binary, then restarts.
Restart (cloud instances)You need the underlying VM restarted via the cloud provider (remote engines only).

Both drain and update require the engine to be connected to the API over WebSocket. If the engine is offline, the API returns an error.


In the workspace

Open Engines, select an engine, and use:

  • Drain — marks the engine as draining, rejects new tasks with engine_draining, and shuts down once all workers are idle.
  • Update — triggers a remote binary update (drain → swap binary → restart).

While draining, status shows draining. Queued tasks that target this engine may wait until another worker is available.


On the server (local CLI)

For engines you manage directly on Linux, use the inference-engine CLI:

bash
1inference-engine update23# Install a specific version4inference-engine update --version v1.10.1556# Print current version7inference-engine version

The engine also checks for updates on startup (rate-limited to once every five minutes). Pass --disable-auto-update to skip the startup check for a single run.

After a config change in engine.yml, restart the service:

bash
1sudo systemctl restart inference-engine

A config-only restart does not run the remote Update flow; use inference-engine update or the workspace Update button when you need a new engine binary.


Via API and SDK

Remote drain and update are available over the REST API and the JavaScript SDK (client.engines).

bash
1# Drain (engine must be online)2curl -X POST "https://api.inference.sh/engines/{engine_id}/drain" \3  -H "Authorization: Bearer inf_your_key" \4  -H "X-API-Version: 2"56# Update binary (drain + restart with latest binary)7curl -X POST "https://api.inference.sh/engines/{engine_id}/update" \8  -H "Authorization: Bearer inf_your_key" \9  -H "X-API-Version: 2"
typescript
1await client.engines.drain(engineId);2await client.engines.updateBinary(engineId);

Requires API key scopes engines:read (list/get) and engines:write (drain, update, stop).

Engines REST API · Configuration · Using private workers


What happens during drain

  1. The API sets engine status to draining and sends an engine_drain event to the connected engine.
  2. The engine stops accepting new tasks (engine_draining rejection reason).
  3. In-flight tasks on each worker run to completion.
  4. When all workers are idle, the engine shuts down gracefully.

Press Ctrl+C twice on the server to force shutdown while draining.


What happens during update

  1. The API sends an engine_update event to the connected engine.
  2. The engine downloads and installs the latest binary, then enters the same drain flow as above.
  3. After workers finish, the engine restarts with the new binary (process re-exec).

If the engine is not connected, update and drain requests fail with an internal error from the API.


Next

GuideTopic
Installing the engineFirst-time setup
ConfigurationWorkers and engine.yml
Using private workersRun tasks on your hardware

Back to Docs

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.