Inference Logoinference.sh

REST API

Direct HTTP access to inference.sh.


Base URL

code
1https://api.inference.sh

Authentication

All requests require an API key in the Authorization header:

code
1Authorization: Bearer inf_your_api_key

Content Type

code
1Content-Type: application/json

API version

Send X-API-Version: 2 to use the modern response format. The official JavaScript and Python SDKs, plus the belt / infsh CLIs, send this header automatically on API calls.

Version 1 (default)Version 2 (X-API-Version: 2)
Success bodyWrapped: { "success": true, "status": 200, "data": { ... } }Bare resource DTO (same fields as data in v1)
Error bodyWrapped: { "success": false, "status": 4xx, "error": { "code", "message" } }RFC 9457 application/problem+json
Requirements (412){ "satisfied": false, "errors": [...] }Same (not wrapped)

Version 2 success example (task from POST /run):

json
1{2  "id": "task_abc123",3  "status": 10,4  "output": { "image": { "uri": "https://..." } }5}

Version 2 error example (401):

json
1{2  "type": "https://api.inference.sh/errors/unauthorized",3  "title": "Unauthorized",4  "status": 401,5  "detail": "Invalid or missing API key"6}

Version 1 error example (same request without the header):

json
1{2  "success": false,3  "status": 401,4  "error": {5    "code": "unauthorized",6    "message": "Invalid or missing API key"7  }8}

REST examples in this section use version 2 unless noted otherwise. Add the header to curl:

bash
1curl https://api.inference.sh/tasks/task_abc123 \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2"

Error Responses

With X-API-Version: 2, errors use Content-Type: application/problem+json and the type, title, status, and detail fields shown above. The type URI ends with the error code (for example .../errors/not_found).

Without the header, errors use the version 1 wrapper with error.code and error.message.

Error Codes

CodeHTTPDescription
unauthorized401Invalid or missing API key
forbidden403Insufficient permissions
not_found404Resource not found
invalid_request400Malformed request
rate_limited429Too many requests
internal_error500Server error

Rate Limits

EndpointLimit
Run task100/minute
Get task1000/minute
Upload file50/minute

Response headers:

code
1X-RateLimit-Remaining: 952X-RateLimit-Reset: 1640000000

Endpoints

  • Tasks — Run apps, status, logs, timings, telemetry, cancellation, webhooks, cost
  • Engines — Private engine list, drain, update, stop
  • Files — Upload files
  • Agents — Agent chat API
  • Skills — Manage and access skills
  • Knowledge — Manage knowledge entries
  • Search — Search apps, skills, knowledge, and pages
  • Streaming — SSE endpoints

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.