Inference Logoinference.sh

Using Private Workers

Run tasks on your own hardware.


In the workspace

Toggle to "Private" when running an app:

code
1Run on: [ Cloud] [ Private]

Via API

python
1result = client.run({2    "app": "my-app",3    "input": {...},4    "infra": "private"5})

Specific workers

Target exact workers:

python
1result = client.run({2    "app": "my-app",3    "input": {...},4    "infra": "private",5    "workers": ["my-server-gpu-0"]6})

Agents on private

Agents can use private workers too.

When an agent calls a tool, it respects your infra setting.


Monitoring

Check your engines at Engines:

  • Online/offline status
  • Resource usage
  • Running tasks

Caching

The engine caches:

  • App code
  • Downloaded models
  • Container images

Second runs are much faster.


That's it!

You now know how to use inference.sh.

Back to Docs

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.