Skip to content

Document minimum IAM permissions required for self-hosting and BYOC #2671

@waiho-gumloop

Description

@waiho-gumloop

Document minimum IAM permissions required for self-hosting and BYOC

Context

I've been self-hosting E2B on GCP following the self-host guide and needed to determine the exact IAM permissions required for the deployer service account. The guide doesn't document this, and using broad roles like roles/editor isn't an option in our environment due to security policy.

This applies equally to BYOC deployments where the customer's security team needs to approve the IAM role before granting access to their cloud account. Without a documented permission list, the approval process stalls or defaults to overly broad roles.

I ran the full lifecycle (make initmake build-and-uploadmake copy-public-buildsmake planmake apply) with a custom IAM role, iteratively adding permissions as 403 errors surfaced and verifying each against GCP audit logs.

Request

Would it be possible to document the minimum required IAM permissions for:

  1. GCP (self-hosting and BYOC — one-time init + ongoing CI/CD)
  2. AWS (self-hosting and BYOC — same split)

This would help:

  • Self-hosters deploying E2B in environments where least-privilege IAM is required
  • BYOC customers whose security teams need to approve permissions before granting E2B access to their cloud account
  • Enterprise / SOC2 / FedRAMP environments where roles/editor or AdministratorAccess are not allowed

A documented split between one-time init permissions and ongoing CI/CD permissions would also allow customers to grant broader access temporarily during setup, then revoke the init-only permissions afterward.

GCP Permissions (validated end-to-end)

I validated 152 GCP permissions by running the full lifecycle with a dedicated service account and no other roles. These split into two groups:

Init-only (32 permissions) — one-time make init

These are needed only for initial setup (VPC, secrets, buckets, Packer image, Artifact Registry). Not needed for ongoing CI/CD.

Category Permission Purpose
IAM iam.serviceAccounts.create Create runtime SA for Nomad nodes
IAM iam.serviceAccountKeys.create JSON key for runtime SA
IAM iam.serviceAccounts.getIamPolicy Read SA IAM to set bindings
IAM iam.serviceAccounts.setIamPolicy Grant runtime SA access to buckets/AR
Compute compute.networks.create E2B VPC for Nomad cluster
Compute compute.networks.delete Packer temp VPC cleanup
Compute compute.networks.updatePolicy Add subnets/firewalls to VPC
Compute compute.subnetworks.create Subnets within E2B VPC
Compute compute.subnetworks.delete Packer temp subnet cleanup
Compute compute.subnetworks.useExternalIp Packer VM needs external IP for IAP SSH
Compute compute.images.create Packer creates Nomad node disk image
Compute compute.images.deprecate Packer sets deprecation status post-build
Compute compute.disks.useReadOnly Packer creates image from disk
Compute compute.disks.delete Packer temp disk cleanup
Compute compute.firewalls.delete Packer temp firewall cleanup
Secrets secretmanager.secrets.create ~20 secret containers (consul/nomad tokens, etc.)
Secrets secretmanager.versions.add Write placeholder values for tokens
Secrets secretmanager.versions.enable Enable secret versions after creation
Storage storage.buckets.create GCS buckets for templates, kernels, snapshots
Storage storage.buckets.update Set versioning and lifecycle rules
Storage storage.buckets.getIamPolicy Read bucket IAM for runtime SA binding
Storage storage.buckets.setIamPolicy Grant runtime SA objectUser/objectViewer
Storage storage.hmacKeys.create HMAC key for ClickHouse GCS access
Storage storage.hmacKeys.delete Destroy old HMAC key on recreate
Storage storage.hmacKeys.get Terraform refresh of HMAC key state
Storage storage.hmacKeys.update Deactivate old HMAC key on recreate
Registry artifactregistry.repositories.create AR repos for Docker images
Registry artifactregistry.repositories.getIamPolicy Read AR IAM for runtime SA binding
Registry artifactregistry.repositories.setIamPolicy Grant runtime SA reader on AR repos
Services serviceusage.services.enable Enable GCP APIs (compute, secretmanager, etc.)
IAM resourcemanager.projects.setIamPolicy Grant runtime SA project-level roles
IAP iap.tunnelInstances.accessViaIAP Packer SSH tunnel to build VM

Ongoing (120 permissions) — CI/CD for make build-and-upload + make plan + make apply

Full list (click to expand)
Category Permission Purpose
Registry artifactregistry.dockerimages.list Resolve image digests during plan
Registry artifactregistry.repositories.downloadArtifacts Pull images onto VMs during apply
Registry artifactregistry.repositories.get Refresh repo state during plan
Registry artifactregistry.repositories.uploadArtifacts Push Docker images during build
Cert Mgr certificatemanager.certmapentries.create Bind cert to domain
Cert Mgr certificatemanager.certmapentries.get Read entry state
Cert Mgr certificatemanager.certmapentries.update Update entry after cert change
Cert Mgr certificatemanager.certmaps.create Create cert map
Cert Mgr certificatemanager.certmaps.get Read cert map
Cert Mgr certificatemanager.certmaps.use Proxy references map
Cert Mgr certificatemanager.certs.create Create TLS cert
Cert Mgr certificatemanager.certs.get Read cert provisioning state
Cert Mgr certificatemanager.certs.update Update cert after drift
Cert Mgr certificatemanager.certs.use Map references cert
Cert Mgr certificatemanager.dnsauthorizations.create DNS auth for cert validation
Cert Mgr certificatemanager.dnsauthorizations.get Get CNAME for DNS setup
Cert Mgr certificatemanager.dnsauthorizations.use Cert references DNS auth
Cert Mgr certificatemanager.operations.get Check cert provisioning progress
Compute compute.addresses.get Read regional address
Compute compute.addresses.use Bind to NAT/LB
Compute compute.autoscalers.create MIG autoscaler for node pools
Compute compute.autoscalers.get Read autoscaler state
Compute compute.backendServices.create LB backend service
Compute compute.backendServices.delete Replace backend on update
Compute compute.backendServices.get Read backend state
Compute compute.backendServices.list Enumerate backends
Compute compute.backendServices.setSecurityPolicy Attach Cloud Armor policy
Compute compute.backendServices.use URL map references backend
Compute compute.disks.create Boot disks for VMs
Compute compute.disks.get Read disk state
Compute compute.disks.list Enumerate disks
Compute compute.disks.setLabels Tag disks
Compute compute.disks.use Attach disk to VM
Compute compute.firewalls.create Firewall rules for Nomad/Consul
Compute compute.firewalls.get Read firewall state
Compute compute.globalAddresses.create Global IP for LB
Compute compute.globalAddresses.get Read address state
Compute compute.globalAddresses.list Enumerate addresses
Compute compute.globalAddresses.setLabels Tag addresses
Compute compute.globalAddresses.use Forwarding rule references address
Compute compute.globalForwardingRules.create LB forwarding rule
Compute compute.globalForwardingRules.get Read forwarding rule state
Compute compute.globalForwardingRules.setLabels Tag forwarding rule
Compute compute.globalOperations.get Poll status of global compute operations
Compute compute.healthChecks.create LB health check
Compute compute.healthChecks.get Read health check
Compute compute.healthChecks.use MIG references health check
Compute compute.healthChecks.useReadOnly Backend references health check
Compute compute.images.get Read image state
Compute compute.images.getFromFamily Resolve image family to latest
Compute compute.images.useReadOnly Instance template references image
Compute compute.instanceGroupManagers.create Create MIG per node pool
Compute compute.instanceGroupManagers.get Read MIG state
Compute compute.instanceGroupManagers.list Enumerate MIGs
Compute compute.instanceGroupManagers.update Rolling update MIG
Compute compute.instanceGroupManagers.use Autoscaler references MIG
Compute compute.instanceGroups.create Unmanaged group for backend
Compute compute.instanceGroups.get Read group state
Compute compute.instanceGroups.list Enumerate groups
Compute compute.instanceGroups.use Backend references group
Compute compute.instanceTemplates.create VM template per node pool
Compute compute.instanceTemplates.get Read template state
Compute compute.instanceTemplates.list Enumerate templates
Compute compute.instanceTemplates.useReadOnly MIG references template
Compute compute.instances.create Create VMs via MIG
Compute compute.instances.delete Terminate VMs
Compute compute.instances.get Read VM state
Compute compute.instances.list Enumerate VMs
Compute compute.instances.setLabels Tag VMs
Compute compute.instances.setMetadata Startup scripts, SSH keys
Compute compute.instances.setServiceAccount Attach runtime SA to VM
Compute compute.instances.setTags Network tags for firewall rules
Compute compute.instances.start Start stopped VMs
Compute compute.instances.stop Stop VMs for maintenance
Compute compute.instances.use Attach disks/network
Compute compute.machineTypes.get Resolve machine type specs
Compute compute.networks.get Read VPC state
Compute compute.networks.use Place VMs in VPC
Compute compute.projects.get Read project compute metadata
Compute compute.regions.list Enumerate regions
Compute compute.routers.get Read Cloud Router state
Compute compute.routers.list Enumerate routers
Compute compute.securityPolicies.create Cloud Armor policy
Compute compute.securityPolicies.get Read policy state
Compute compute.securityPolicies.list Enumerate policies
Compute compute.securityPolicies.update Update Cloud Armor rules
Compute compute.securityPolicies.use Backend references policy
Compute compute.sslPolicies.create TLS policy (min version, ciphers)
Compute compute.sslPolicies.get Read TLS policy
Compute compute.sslPolicies.use HTTPS proxy references policy
Compute compute.subnetworks.get Read subnet state
Compute compute.subnetworks.use Place VMs in subnet
Compute compute.targetHttpsProxies.create HTTPS proxy for LB
Compute compute.targetHttpsProxies.get Read proxy state
Compute compute.targetHttpsProxies.use Forwarding rule references proxy
Compute compute.urlMaps.create URL map for LB routing
Compute compute.urlMaps.get Read URL map state
Compute compute.urlMaps.use Proxy references URL map
Compute compute.zoneOperations.get Poll status of zonal compute operations
Compute compute.zones.get Resolve zone details
Compute compute.zones.list Enumerate zones
IAM iam.serviceAccountKeys.get Refresh SA key state
IAM iam.serviceAccounts.actAs Attach runtime SA to VMs
IAM iam.serviceAccounts.get Refresh SA state
Monitoring monitoring.metricDescriptors.get Terraform refresh
Monitoring monitoring.timeSeries.create Node SA writes metrics
IAM resourcemanager.projects.get Read project info
IAM resourcemanager.projects.getIamPolicy Read project IAM bindings
Secrets secretmanager.secrets.get Read secret metadata
Secrets secretmanager.secrets.list Enumerate secrets
Secrets secretmanager.versions.access Read secret values for Nomad jobs
Secrets secretmanager.versions.get Read version metadata
Secrets secretmanager.versions.list Enumerate versions
Services serviceusage.services.get Check APIs are enabled
Services serviceusage.services.list Enumerate enabled APIs
Storage storage.buckets.get Read bucket state
Storage storage.objects.create Upload binaries to GCS
Storage storage.objects.delete Replace old binaries
Storage storage.objects.get Read objects
Storage storage.objects.list List objects / TF workspaces

AWS Permissions (from CloudFormation, unvalidated)

The aws-samples/sample-e2b-on-aws CloudFormation template defines an E2BDeploymentPolicy with ~151 explicit permissions + 5 wildcard groups (autoscaling:*, elasticloadbalancing:*, etc.). These haven't been validated through audit logs like the GCP list.

It would be useful to know if the official iac/provider-aws/ in this repo requires the same permissions as the CloudFormation template, or if there are differences.

Additional observations

  • Operation polling permissions: compute.globalOperations.get and compute.zoneOperations.get are needed for Terraform and Packer to poll async operation status. Without them, operations succeed at the GCP API level but the tooling can't confirm completion and reports misleading timeout errors (e.g., Packer's "time out while waiting for image to register").

  • compute.images.deprecate is called by Packer after image creation even when no deprecation config is set.

  • The init/ongoing split allows BYOC customers to grant broader access temporarily during setup (make init), then revoke the init-only permissions and leave only the CI/CD role for ongoing operations.

Environment

  • E2B infra: main branch (commit at time of testing)
  • Terraform: 1.5.7
  • Packer plugin: hashicorp/googlecompute v1.2.5
  • GCP region: us-west1
  • Provider: GCP (iac/provider-gcp/)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions