Direct HTTP access to inference.sh.
Base URL
1https://api.inference.shAuthentication
All requests require an API key in the Authorization header:
1Authorization: Bearer inf_your_api_keyContent Type
1Content-Type: application/jsonAPI version
Send X-API-Version: 2 to use the modern response format. The official JavaScript and Python SDKs, plus the belt / infsh CLIs, send this header automatically on API calls.
| Version 1 (default) | Version 2 (X-API-Version: 2) | |
|---|---|---|
| Success body | Wrapped: { "success": true, "status": 200, "data": { ... } } | Bare resource DTO (same fields as data in v1) |
| Error body | Wrapped: { "success": false, "status": 4xx, "error": { "code", "message" } } | RFC 9457 application/problem+json |
| Requirements (412) | { "satisfied": false, "errors": [...] } | Same (not wrapped) |
Version 2 success example (task from POST /run):
1{2 "id": "task_abc123",3 "status": 10,4 "output": { "image": { "uri": "https://..." } }5}Version 2 error example (401):
1{2 "type": "https://api.inference.sh/errors/unauthorized",3 "title": "Unauthorized",4 "status": 401,5 "detail": "Invalid or missing API key"6}Version 1 error example (same request without the header):
1{2 "success": false,3 "status": 401,4 "error": {5 "code": "unauthorized",6 "message": "Invalid or missing API key"7 }8}REST examples in this section use version 2 unless noted otherwise. Add the header to curl:
1curl https://api.inference.sh/tasks/task_abc123 \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2"Error Responses
With X-API-Version: 2, errors use Content-Type: application/problem+json and the type, title, status, and detail fields shown above. The type URI ends with the error code (for example .../errors/not_found).
Without the header, errors use the version 1 wrapper with error.code and error.message.
Error Codes
| Code | HTTP | Description |
|---|---|---|
unauthorized | 401 | Invalid or missing API key |
forbidden | 403 | Insufficient permissions |
not_found | 404 | Resource not found |
invalid_request | 400 | Malformed request |
rate_limited | 429 | Too many requests |
internal_error | 500 | Server error |
Rate Limits
| Endpoint | Limit |
|---|---|
| Run task | 100/minute |
| Get task | 1000/minute |
| Upload file | 50/minute |
Response headers:
1X-RateLimit-Remaining: 952X-RateLimit-Reset: 1640000000Endpoints
- Tasks — Run apps, status, logs, timings, telemetry, cancellation, webhooks, cost
- Engines — Private engine list, drain, update, stop
- Files — Upload files
- Agents — Agent chat API
- Skills — Manage and access skills
- Knowledge — Manage knowledge entries
- Search — Search apps, skills, knowledge, and pages
- Streaming — SSE endpoints