Inference Logoinference.sh

Google Cloud Platform

Connect your own GCP project to use Vertex AI, BigQuery, Cloud Storage, and other GCP services with your own quotas and billing.

How It Works

inference.sh uses Workload Identity Federation (WIF) - the secure, keyless way to authenticate:

  • No credentials stored on inference.sh
  • Tokens are generated on-demand and expire in 1 hour
  • You control access via IAM
  • Full audit trail in Cloud Audit Logs
  • Revoke access anytime by deleting the WIF provider

Prerequisites

  1. gcloud CLI - Install here
  2. Logged in - Run gcloud auth login if you haven't already
  3. A GCP project with billing enabled

Run our setup script - it handles everything automatically:

bash
1curl -sL https://cloud.inference.sh/scripts/setup-gcp.sh | bash -s -- YOUR_PROJECT_ID

The script will:

  1. Enable required APIs
  2. Create a Workload Identity Pool
  3. Create a Service Account
  4. Configure all permissions

Then copy the output values to inference.sh.


Manual Setup

If you prefer to set things up manually:

Step 1: Get Your Project Number

bash
1gcloud projects describe YOUR_PROJECT_ID --format='value(projectNumber)'

Step 2: Enable APIs

bash
1gcloud services enable \2  iam.googleapis.com \3  iamcredentials.googleapis.com \4  sts.googleapis.com \5  --project=YOUR_PROJECT_ID

Step 3: Create Workload Identity Pool

bash
1gcloud iam workload-identity-pools create inference-sh-pool \2  --location="global" \3  --display-name="inference.sh" \4  --project=YOUR_PROJECT_ID

Step 4: Create OIDC Provider

bash
1gcloud iam workload-identity-pools providers create-oidc inference-sh \2  --location="global" \3  --workload-identity-pool="inference-sh-pool" \4  --issuer-uri="https://api.inference.sh" \5  --attribute-mapping="google.subject=assertion.sub" \6  --project=YOUR_PROJECT_ID

Step 5: Create Service Account

bash
1gcloud iam service-accounts create inference-sh-sa \2  --display-name="inference.sh Integration" \3  --project=YOUR_PROJECT_ID

Step 6: Allow Impersonation

bash
1gcloud iam service-accounts add-iam-policy-binding \2  inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com \3  --role="roles/iam.workloadIdentityUser" \4  --member="principalSet://iam.googleapis.com/projects/YOUR_PROJECT_NUMBER/locations/global/workloadIdentityPools/inference-sh-pool/*" \5  --project=YOUR_PROJECT_ID

Step 7: Grant Permissions

Grant the permissions your apps need:

bash
1# For Vertex AI2gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \3  --member="serviceAccount:inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \4  --role="roles/aiplatform.user"5 6# For BigQuery (optional)7gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \8  --member="serviceAccount:inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \9  --role="roles/bigquery.user"

Step 8: Configure on inference.sh

Go to Settings → Integrations and add:

SecretValue
GCP_PROJECT_IDYour project ID (e.g., my-project)
GCP_PROJECT_NUMBERYour project number (e.g., 123456789012)

The service account email is derived automatically: inference-sh-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com


Using in Apps

Declare the capabilities your app needs:

yaml
1requirements:2  integrations:3    - key: gcp.vertex_ai

At runtime, your app receives:

bash
1GCP_ACCESS_TOKEN=ya29.xxx...      # Short-lived access token (1 hour)2GCP_PROJECT_NUMBER=123456789012   # For API URLs that need project context

Example: Vertex AI

python
1import os2import requests3 4token = os.environ["GCP_ACCESS_TOKEN"]5project = os.environ["GCP_PROJECT_NUMBER"]6 7# Use the appropriate region for your model8# - Most models: us-central19# - Some models (e.g., gemini-3-pro-image-preview): global10location = "us-central1"11 12response = requests.post(13    f"https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/gemini-2.0-flash:generateContent",14    headers={"Authorization": f"Bearer {token}"},15    json={"contents": [{"parts": [{"text": "Hello!"}]}]}16)

Available Capabilities

CapabilityDescriptionRequired IAM Role
gcp.vertex_aiVertex AI models (Gemini, etc.)roles/aiplatform.user
gcp.vertex_ai.tuningFine-tune AI modelsroles/aiplatform.admin
gcp.bigqueryQuery BigQuery dataroles/bigquery.user
gcp.bigquery.adminCreate/manage datasetsroles/bigquery.admin
gcp.storageRead Cloud Storageroles/storage.objectViewer
gcp.storage.writeWrite Cloud Storageroles/storage.objectAdmin
gcp.pubsubPub/Sub messagingroles/pubsub.editor

Troubleshooting

"invalid_target" error

The WIF pool doesn't exist. Make sure you created inference-sh-pool (exact name).

"Token exchange failed" error

  1. Verify the WIF pool and provider exist
  2. Check the issuer URI is exactly https://api.inference.sh
  3. Ensure APIs are enabled: iam, iamcredentials, sts

"Permission denied" on impersonation

Run Step 6 again - the WIF binding allows inference.sh to use your service account.

"Permission denied" on API calls

Your service account needs the right IAM role. Check Step 7.

Model not found (404) errors

Some models are only available in specific regions. For example, gemini-3-pro-image-preview is only available in the global region, not us-central1.


Security Best Practices

  1. Least privilege - Only grant the IAM roles your apps actually need
  2. Regular audits - Review Cloud Audit Logs for unusual activity
  3. Separate projects - Consider using a dedicated project for inference.sh workloads

Revoking Access

To disconnect inference.sh from your project:

bash
1# Delete the WIF provider2gcloud iam workload-identity-pools providers delete inference-sh \3  --location="global" \4  --workload-identity-pool="inference-sh-pool" \5  --project=YOUR_PROJECT_ID

Or delete the entire pool:

bash
1gcloud iam workload-identity-pools delete inference-sh-pool \2  --location="global" \3  --project=YOUR_PROJECT_ID

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.