Inference Logoinference.sh

Entitlements

List your team's plan limits and feature flags. Entitlements are enforced when you create API keys, connectors, knowledge bases, private apps, upload files, or run tasks.

Use GET /entitlements to see entitlement rows (limits and sources). Use GET /entitlements/usage to see current usage vs limits.


List entitlements

GET /entitlements

Returns all entitlement rows for the API key's team. Requires a valid API key (any scope).

Example:

bash
1curl https://api.inference.sh/entitlements \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2"

Response (array of entitlement objects):

json
1[2  {3    "id": "ent_abc123",4    "team_id": "team_xyz",5    "resource": "api_keys",6    "type": "limit",7    "enabled": true,8    "unlimited": false,9    "limit": 5,10    "source": "tier",11    "enforcement": "block"12  }13]

Fields

FieldTypeDescription
resourcestringWhat is limited — see Resources below
typestringEntitlement type from the plan definition
enabledbooleanWhether the resource or feature is enabled
unlimitedbooleanWhen true, no numeric cap applies
limitnumberMaximum allowed (when not unlimited)
sourcestringOrigin: tier, trial, whitelist, or override
enforcementstringblock (hard deny when exceeded) or warn (allow with warning)
expires_atstringISO timestamp when a trial or override expires (optional)

When multiple rows exist for the same resource, the API resolves the highest-priority source: override > whitelist > trial > tier.


Get usage

GET /entitlements/usage

Returns current usage vs limits for countable resources, inflight task concurrency, and boolean feature gates. Requires a valid API key (any scope).

Unlike GET /entitlements, this endpoint includes live usage counts — for example how many API keys exist, how much storage is used, and how many tasks are currently in flight.

Example:

bash
1curl https://api.inference.sh/entitlements/usage \2  -H "Authorization: Bearer inf_your_key" \3  -H "X-API-Version: 2"

Response (array of usage objects):

json
1[2  {3    "resource": "api_keys",4    "label": "api keys",5    "type": "limit",6    "usage": 3,7    "limit": 5,8    "unlimited": false9  },10  {11    "resource": "storage_mb",12    "label": "storage",13    "type": "limit",14    "unit": "mb",15    "usage": 128,16    "limit": 1024,17    "unlimited": false18  },19  {20    "resource": "concurrency",21    "label": "concurrent tasks",22    "type": "limit",23    "usage": 2,24    "limit": 3,25    "unlimited": false26  },27  {28    "resource": "feature:webhooks",29    "label": "webhooks",30    "type": "boolean",31    "enabled": true32  }33]

Usage fields

FieldTypeDescription
resourcestringResource identifier (same values as Resources)
labelstringHuman-readable name (for example concurrent tasks, api keys)
unitstringDisplay unit from the plan (for example mb, days) — optional
typestringlimit (numeric cap) or boolean (feature gate)
usagenumberCurrent usage (limit-type resources only)
limitnumberMaximum allowed when not unlimited (limit-type only)
unlimitedbooleanWhen true, no numeric cap applies (limit-type only)
enabledbooleanWhether the feature is enabled (boolean-type only)

When no entitlement row exists for a limit-type resource, the response sets unlimited: true and still returns the current usage.

What is counted

ResourceUsage source
api_keysNumber of API keys for the team
storage_mbTotal uploaded file storage in megabytes
seatsNumber of team members
triggersTrigger subscriptions linked to agents
concurrencyInflight tasks — queued (received, queued) plus in-progress (dispatched, preparing, serving, setting_up, running, cancelling, uploading)

Boolean features (for example feature:webhooks, feature:byok) appear only when an entitlement row exists for that feature.

In the workspace, Settings → Subscription shows the same usage rows with progress bars — yellow at 80% of a finite limit, orange at the cap.


Resources

Common resource values returned by GET /entitlements:

ResourceTypical use
api_keysNumber of API keys
connectorsMCP connectors
knowledge_basesKnowledge entries
private_appsPrivate app deployments
storage_mbTotal uploaded file storage
task_executionsTask runs per billing period
concurrencyConcurrent tasks (inflight runs)
rate_per_minAPI rate per minute
seatsTeam members
triggersAgent trigger subscriptions (event, schedule, or manual)
retention_daysData retention period in days (plan policy; not usage-counted)
feature:scopesCustom API key scopes
feature:webhooksWebhooks
feature:byokBring your own keys
feature:publish_appsPublishing apps to the store
feature:auto_rechargeAuto-recharge billing
feature:invoicesInvoice billing
feature:team_billingTeam billing features

If no row exists for a resource, there is no plan restriction for that resource.


When limits block a request

Hard-blocked entitlements return API errors before the operation completes:

CodeHTTPMeaning
limit_exceeded402Numeric limit reached
feature_not_available403Boolean feature disabled on your tier
payment_required402Insufficient prepaid balance (separate from plan limits)

Error metadata

When a plan limit blocks a request, structured metadata is attached to the error response. Clients that drive upgrade flows (the workspace app, belt CLI) read these fields from the raw response body.

FieldDescription
resourceResource identifier (for example api_keys, concurrency)
resource_labelHuman-readable label (for example concurrent tasks)
limitPlan limit that was exceeded
currentUsage at the time of the request
upgrade_availableAlways true for entitlement errors — the CLI prints a subscription upgrade URL; the workspace opens the upgrade modal

Version 1 (default, no X-API-Version header) — metadata is nested under error.meta:

json
1{2  "success": false,3  "status": 402,4  "error": {5    "code": "limit_exceeded",6    "message": "You have reached the limit of 3 concurrent tasks",7    "meta": {8      "resource": "concurrency",9      "resource_label": "concurrent tasks",10      "limit": 3,11      "current": 3,12      "upgrade_available": true13    }14  }15}

Version 2 (X-API-Version: 2) — returns RFC 9457 application/problem+json with a human-readable detail. Inspect limits programmatically with GET /entitlements/usage when you need resource, limit, and current on version 2 responses:

json
1{2  "type": "https://api.inference.sh/errors/limit_exceeded",3  "title": "Payment Required",4  "status": 402,5  "detail": "You have reached the limit of 3 concurrent tasks"6}

Workspace upgrade modal

When a workspace API call returns limit_exceeded or feature_not_available with upgrade_available: true, the app opens the upgrade modal and passes error metadata (resource, resource_label, limit, current, and detail). The modal lists self-serve plans above your current tier and highlights the recommended plan — the cheapest tier that lifts the blocked resource. For numeric limits it shows current / limit only when limit is greater than zero.

See REST overview — Billing and plan limits for modal behavior, workspace links, and credit balance errors.


we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.