Run tasks on your own hardware.
In the workspace
Toggle to "Private" when running an app:
code
1Run on: [○ Cloud] [● Private]Via API
python
1result = client.run({2 "app": "my-app",3 "input": {...},4 "infra": "private"5})Specific workers
Target exact workers:
python
1result = client.run({2 "app": "my-app",3 "input": {...},4 "infra": "private",5 "workers": ["my-server-gpu-0"]6})Agents on private
Agents can use private workers too.
When an agent calls a tool, it respects your infra setting.
Monitoring
Check your engines at Engines:
- Online/offline status
- Resource usage
- Running tasks
Caching
The engine caches:
- App code
- Downloaded models
- Container images
Second runs are much faster.
That's it!
You now know how to use inference.sh.