List your team's plan limits and feature flags. Entitlements are enforced when you create API keys, connectors, knowledge bases, private apps, upload files, or run tasks.
Use GET /entitlements to see entitlement rows (limits and sources). Use GET /entitlements/usage to see current usage vs limits.
List entitlements
GET /entitlements
Returns all entitlement rows for the API key's team. Requires a valid API key (any scope).
Example:
1curl https://api.inference.sh/entitlements \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2"Response (array of entitlement objects):
1[2 {3 "id": "ent_abc123",4 "team_id": "team_xyz",5 "resource": "api_keys",6 "type": "limit",7 "enabled": true,8 "unlimited": false,9 "limit": 5,10 "source": "tier",11 "enforcement": "block"12 }13]Fields
| Field | Type | Description |
|---|---|---|
resource | string | What is limited — see Resources below |
type | string | Entitlement type from the plan definition |
enabled | boolean | Whether the resource or feature is enabled |
unlimited | boolean | When true, no numeric cap applies |
limit | number | Maximum allowed (when not unlimited) |
source | string | Origin: tier, trial, whitelist, or override |
enforcement | string | block (hard deny when exceeded) or warn (allow with warning) |
expires_at | string | ISO timestamp when a trial or override expires (optional) |
When multiple rows exist for the same resource, the API resolves the highest-priority source: override > whitelist > trial > tier.
Get usage
GET /entitlements/usage
Returns current usage vs limits for countable resources, inflight task concurrency, and boolean feature gates. Requires a valid API key (any scope).
Unlike GET /entitlements, this endpoint includes live usage counts — for example how many API keys exist, how much storage is used, and how many tasks are currently in flight.
Example:
1curl https://api.inference.sh/entitlements/usage \2 -H "Authorization: Bearer inf_your_key" \3 -H "X-API-Version: 2"Response (array of usage objects):
1[2 {3 "resource": "api_keys",4 "label": "api keys",5 "type": "limit",6 "usage": 3,7 "limit": 5,8 "unlimited": false9 },10 {11 "resource": "storage_mb",12 "label": "storage",13 "type": "limit",14 "unit": "mb",15 "usage": 128,16 "limit": 1024,17 "unlimited": false18 },19 {20 "resource": "concurrency",21 "label": "concurrent tasks",22 "type": "limit",23 "usage": 2,24 "limit": 3,25 "unlimited": false26 },27 {28 "resource": "feature:webhooks",29 "label": "webhooks",30 "type": "boolean",31 "enabled": true32 }33]Usage fields
| Field | Type | Description |
|---|---|---|
resource | string | Resource identifier (same values as Resources) |
label | string | Human-readable name (for example concurrent tasks, api keys) |
unit | string | Display unit from the plan (for example mb, days) — optional |
type | string | limit (numeric cap) or boolean (feature gate) |
usage | number | Current usage (limit-type resources only) |
limit | number | Maximum allowed when not unlimited (limit-type only) |
unlimited | boolean | When true, no numeric cap applies (limit-type only) |
enabled | boolean | Whether the feature is enabled (boolean-type only) |
When no entitlement row exists for a limit-type resource, the response sets unlimited: true and still returns the current usage.
What is counted
| Resource | Usage source |
|---|---|
api_keys | Number of API keys for the team |
storage_mb | Total uploaded file storage in megabytes |
seats | Number of team members |
triggers | Trigger subscriptions linked to agents |
concurrency | Inflight tasks — queued (received, queued) plus in-progress (dispatched, preparing, serving, setting_up, running, cancelling, uploading) |
Boolean features (for example feature:webhooks, feature:byok) appear only when an entitlement row exists for that feature.
In the workspace, Settings → Subscription shows the same usage rows with progress bars — yellow at 80% of a finite limit, orange at the cap.
Resources
Common resource values returned by GET /entitlements:
| Resource | Typical use |
|---|---|
api_keys | Number of API keys |
connectors | MCP connectors |
knowledge_bases | Knowledge entries |
private_apps | Private app deployments |
storage_mb | Total uploaded file storage |
task_executions | Task runs per billing period |
concurrency | Concurrent tasks (inflight runs) |
rate_per_min | API rate per minute |
seats | Team members |
triggers | Agent trigger subscriptions (event, schedule, or manual) |
retention_days | Data retention period in days (plan policy; not usage-counted) |
feature:scopes | Custom API key scopes |
feature:webhooks | Webhooks |
feature:byok | Bring your own keys |
feature:publish_apps | Publishing apps to the store |
feature:auto_recharge | Auto-recharge billing |
feature:invoices | Invoice billing |
feature:team_billing | Team billing features |
If no row exists for a resource, there is no plan restriction for that resource.
When limits block a request
Hard-blocked entitlements return API errors before the operation completes:
| Code | HTTP | Meaning |
|---|---|---|
limit_exceeded | 402 | Numeric limit reached |
feature_not_available | 403 | Boolean feature disabled on your tier |
payment_required | 402 | Insufficient prepaid balance (separate from plan limits) |
Error metadata
When a plan limit blocks a request, structured metadata is attached to the error response. Clients that drive upgrade flows (the workspace app, belt CLI) read these fields from the raw response body.
| Field | Description |
|---|---|
resource | Resource identifier (for example api_keys, concurrency) |
resource_label | Human-readable label (for example concurrent tasks) |
limit | Plan limit that was exceeded |
current | Usage at the time of the request |
upgrade_available | Always true for entitlement errors — the CLI prints a subscription upgrade URL; the workspace opens the upgrade modal |
Version 1 (default, no X-API-Version header) — metadata is nested under error.meta:
1{2 "success": false,3 "status": 402,4 "error": {5 "code": "limit_exceeded",6 "message": "You have reached the limit of 3 concurrent tasks",7 "meta": {8 "resource": "concurrency",9 "resource_label": "concurrent tasks",10 "limit": 3,11 "current": 3,12 "upgrade_available": true13 }14 }15}Version 2 (X-API-Version: 2) — returns RFC 9457 application/problem+json with a human-readable detail. Inspect limits programmatically with GET /entitlements/usage when you need resource, limit, and current on version 2 responses:
1{2 "type": "https://api.inference.sh/errors/limit_exceeded",3 "title": "Payment Required",4 "status": 402,5 "detail": "You have reached the limit of 3 concurrent tasks"6}Workspace upgrade modal
When a workspace API call returns limit_exceeded or feature_not_available with upgrade_available: true, the app opens the upgrade modal and passes error metadata (resource, resource_label, limit, current, and detail). The modal lists self-serve plans above your current tier and highlights the recommended plan — the cheapest tier that lifts the blocked resource. For numeric limits it shows current / limit only when limit is greater than zero.
See REST overview — Billing and plan limits for modal behavior, workspace links, and credit balance errors.
Related
- REST overview — authentication and API versioning
- Tasks API — run apps (may return
payment_required) - Troubleshooting — common limit and balance errors