Connect your own GCP project to use Vertex AI, BigQuery, Cloud Storage, and other GCP services with your own quotas and billing.
How It Works
inference.sh uses Workload Identity Federation (WIF) - the secure, keyless way to authenticate:
- No credentials stored on inference.sh
- Tokens are generated on-demand and expire in 1 hour
- You control access via IAM
- Full audit trail in Cloud Audit Logs
- Revoke access anytime by deleting the WIF provider
Prerequisites
- gcloud CLI - Install here
- Logged in - Run
gcloud auth loginif you haven't already - A GCP project with billing enabled
Quick Setup (Recommended)
Run our setup script - it handles everything automatically:
1curl -sL https://cloud.inference.sh/scripts/setup-gcp.sh | bash -s -- YOUR_PROJECT_IDThe script will:
- Enable required APIs
- Create a Workload Identity Pool
- Create a Service Account
- Configure all permissions
Then copy the output values to inference.sh.
Manual Setup
If you prefer to set things up manually:
Step 1: Get Your Project Number
1gcloud projects describe YOUR_PROJECT_ID --format='value(projectNumber)'Step 2: Enable APIs
1gcloud services enable \2 iam.googleapis.com \3 iamcredentials.googleapis.com \4 sts.googleapis.com \5 --project=YOUR_PROJECT_IDStep 3: Create Workload Identity Pool
1gcloud iam workload-identity-pools create inference-sh-pool \2 --location="global" \3 --display-name="inference.sh" \4 --project=YOUR_PROJECT_IDStep 4: Create OIDC Provider
1gcloud iam workload-identity-pools providers create-oidc inference-sh \2 --location="global" \3 --workload-identity-pool="inference-sh-pool" \4 --issuer-uri="https://api.inference.sh" \5 --attribute-mapping="google.subject=assertion.sub" \6 --project=YOUR_PROJECT_IDStep 5: Create Service Account
1gcloud iam service-accounts create inference-sh-sa \2 --display-name="inference.sh Integration" \3 --project=YOUR_PROJECT_IDStep 6: Allow Impersonation
1gcloud iam service-accounts add-iam-policy-binding \2 inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com \3 --role="roles/iam.workloadIdentityUser" \4 --member="principalSet://iam.googleapis.com/projects/YOUR_PROJECT_NUMBER/locations/global/workloadIdentityPools/inference-sh-pool/*" \5 --project=YOUR_PROJECT_IDStep 7: Grant Permissions
Grant the permissions your apps need:
1# For Vertex AI2gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \3 --member="serviceAccount:inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \4 --role="roles/aiplatform.user"5 6# For BigQuery (optional)7gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \8 --member="serviceAccount:inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \9 --role="roles/bigquery.user"Step 8: Configure on inference.sh
Go to Settings → Integrations and add:
| Secret | Value |
|---|---|
GCP_PROJECT_ID | Your project ID (e.g., my-project) |
GCP_PROJECT_NUMBER | Your project number (e.g., 123456789012) |
The service account email is derived automatically: inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com
Using in Apps
Declare the capabilities your app needs:
1requirements:2 integrations:3 - key: gcp.vertex_aiAt runtime, your app receives:
1GCP_ACCESS_TOKEN=ya29.xxx... # Short-lived access token (1 hour)2GCP_PROJECT_NUMBER=123456789012 # For API URLs that need project contextExample: Vertex AI
1import os2import requests3 4token = os.environ["GCP_ACCESS_TOKEN"]5project = os.environ["GCP_PROJECT_NUMBER"]6 7# Use the appropriate region for your model8# - Most models: us-central19# - Some models (e.g., gemini-3-pro-image-preview): global10location = "us-central1"11 12response = requests.post(13 f"https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/gemini-2.0-flash:generateContent",14 headers={"Authorization": f"Bearer {token}"},15 json={"contents": [{"parts": [{"text": "Hello!"}]}]}16)Available Capabilities
| Capability | Description | Required IAM Role |
|---|---|---|
gcp.vertex_ai | Vertex AI models (Gemini, etc.) | roles/aiplatform.user |
gcp.vertex_ai.tuning | Fine-tune AI models | roles/aiplatform.admin |
gcp.bigquery | Query BigQuery data | roles/bigquery.user |
gcp.bigquery.admin | Create/manage datasets | roles/bigquery.admin |
gcp.storage | Read Cloud Storage | roles/storage.objectViewer |
gcp.storage.write | Write Cloud Storage | roles/storage.objectAdmin |
gcp.pubsub | Pub/Sub messaging | roles/pubsub.editor |
Troubleshooting
"invalid_target" error
The WIF pool doesn't exist. Make sure you created inference-sh-pool (exact name).
"Token exchange failed" error
- Verify the WIF pool and provider exist
- Check the issuer URI is exactly
https://api.inference.sh - Ensure APIs are enabled:
iam,iamcredentials,sts
"Permission denied" on impersonation
Run Step 6 again - the WIF binding allows inference.sh to use your service account.
"Permission denied" on API calls
Your service account needs the right IAM role. Check Step 7.
Model not found (404) errors
Some models are only available in specific regions. For example, gemini-3-pro-image-preview is only available in the global region, not us-central1.
Security Best Practices
- Least privilege - Only grant the IAM roles your apps actually need
- Regular audits - Review Cloud Audit Logs for unusual activity
- Separate projects - Consider using a dedicated project for inference.sh workloads
Revoking Access
To disconnect inference.sh from your project:
1# Delete the WIF provider2gcloud iam workload-identity-pools providers delete inference-sh \3 --location="global" \4 --workload-identity-pool="inference-sh-pool" \5 --project=YOUR_PROJECT_IDOr delete the entire pool:
1gcloud iam workload-identity-pools delete inference-sh-pool \2 --location="global" \3 --project=YOUR_PROJECT_ID