Connect your own GCP project to use Vertex AI, BigQuery, Cloud Storage, and other GCP services with your own quotas and billing.

How It Works

inference.sh uses Workload Identity Federation (WIF) - the secure, keyless way to authenticate:

No credentials stored on inference.sh
Tokens are generated on-demand and expire in 1 hour
You control access via IAM
Full audit trail in Cloud Audit Logs
Revoke access anytime by deleting the WIF provider

Prerequisites

gcloud CLI - Install here
Logged in - Run gcloud auth login if you haven't already
A GCP project with billing enabled

Quick Setup (Recommended)

Run our setup script - it handles everything automatically:

bash

1curl -sL https://cloud.inference.sh/scripts/setup-gcp.sh | bash -s -- YOUR_PROJECT_ID

The script will:

Enable required APIs
Create a Workload Identity Pool
Create a Service Account
Configure all permissions

Then copy the output values to inference.sh.

Manual Setup

If you prefer to set things up manually:

Step 1: Get Your Project Number

bash

1gcloud projects describe YOUR_PROJECT_ID --format='value(projectNumber)'

Step 2: Enable APIs

bash

1gcloud services enable \2  iam.googleapis.com \3  iamcredentials.googleapis.com \4  sts.googleapis.com \5  --project=YOUR_PROJECT_ID

Step 3: Create Workload Identity Pool

bash

1gcloud iam workload-identity-pools create inference-sh-pool \2  --location="global" \3  --display-name="inference.sh" \4  --project=YOUR_PROJECT_ID

Step 4: Create OIDC Provider

bash

1gcloud iam workload-identity-pools providers create-oidc inference-sh \2  --location="global" \3  --workload-identity-pool="inference-sh-pool" \4  --issuer-uri="https://api.inference.sh" \5  --attribute-mapping="google.subject=assertion.sub" \6  --project=YOUR_PROJECT_ID

Step 5: Create Service Account

bash

1gcloud iam service-accounts create inference-sh-sa \2  --display-name="inference.sh Integration" \3  --project=YOUR_PROJECT_ID

Step 6: Allow Impersonation

bash

1gcloud iam service-accounts add-iam-policy-binding \2  inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com \3  --role="roles/iam.workloadIdentityUser" \4  --member="principalSet://iam.googleapis.com/projects/YOUR_PROJECT_NUMBER/locations/global/workloadIdentityPools/inference-sh-pool/*" \5  --project=YOUR_PROJECT_ID

Step 7: Grant Permissions

Grant the permissions your apps need:

bash

1# For Vertex AI2gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \3  --member="serviceAccount:inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \4  --role="roles/aiplatform.user"56# For BigQuery (optional)7gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \8  --member="serviceAccount:inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \9  --role="roles/bigquery.user"

Step 8: Configure on inference.sh

Go to Settings → Integrations and add:

Secret	Value
`GCP_PROJECT_ID`	Your project ID (e.g., `my-project`)
`GCP_PROJECT_NUMBER`	Your project number (e.g., `123456789012`)

The service account email is derived automatically: inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com

Using in Apps

Declare the capabilities your app needs:

yaml

1requirements:2  integrations:3    - key: gcp.vertex_ai

At runtime, your app receives:

bash

1GCP_ACCESS_TOKEN=ya29.xxx...      # Short-lived access token (1 hour)2GCP_PROJECT_NUMBER=123456789012   # For API URLs that need project context

Example: Vertex AI

python

1import os2import requests34token = os.environ["GCP_ACCESS_TOKEN"]5project = os.environ["GCP_PROJECT_NUMBER"]67# Use the appropriate region for your model8# - Most models: us-central19# - Some models (e.g., gemini-3-pro-image-preview): global10location = "us-central1"1112response = requests.post(13    f"https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/gemini-2.0-flash:generateContent",14    headers={"Authorization": f"Bearer {token}"},15    json={"contents": [{"parts": [{"text": "Hello!"}]}]}16)

Available Capabilities

Capability	Description	Required IAM Role
`gcp.vertex_ai`	Vertex AI models (Gemini, etc.)	`roles/aiplatform.user`
`gcp.vertex_ai.tuning`	Fine-tune AI models	`roles/aiplatform.admin`
`gcp.bigquery`	Query BigQuery data	`roles/bigquery.user`
`gcp.bigquery.admin`	Create/manage datasets	`roles/bigquery.admin`
`gcp.storage`	Read Cloud Storage	`roles/storage.objectViewer`
`gcp.storage.write`	Write Cloud Storage	`roles/storage.objectAdmin`
`gcp.pubsub`	Pub/Sub messaging	`roles/pubsub.editor`

Troubleshooting

"invalid_target" error

The WIF pool doesn't exist. Make sure you created inference-sh-pool (exact name).

"Token exchange failed" error

Verify the WIF pool and provider exist
Check the issuer URI is exactly https://api.inference.sh
Ensure APIs are enabled: iam, iamcredentials, sts

"Permission denied" on impersonation

Run Step 6 again - the WIF binding allows inference.sh to use your service account.

"Permission denied" on API calls

Your service account needs the right IAM role. Check Step 7.

Model not found (404) errors

Some models are only available in specific regions. For example, gemini-3-pro-image-preview is only available in the global region, not us-central1.

Security Best Practices

Least privilege - Only grant the IAM roles your apps actually need
Regular audits - Review Cloud Audit Logs for unusual activity
Separate projects - Consider using a dedicated project for inference.sh workloads

Revoking Access

To disconnect inference.sh from your project:

bash

1# Delete the WIF provider2gcloud iam workload-identity-pools providers delete inference-sh \3  --location="global" \4  --workload-identity-pool="inference-sh-pool" \5  --project=YOUR_PROJECT_ID

Or delete the entire pool:

bash

1gcloud iam workload-identity-pools delete inference-sh-pool \2  --location="global" \3  --project=YOUR_PROJECT_ID