This guide covers how cocoon-net provisions VPC-native networking on GKE bare-metal/GCE instances so that Windows and Linux VMs running via cocoon are directly routable within the GKE VPC.
GKE VPC (e.g. 10.0.0.0/8)
├── Primary subnet (GKE nodes, pods)
│ └── Secondary IP range: cocoon-pods=172.20.0.0/16
│
├── cocoonset-node-1 (GCE instance)
│ ├── ens4: primary NIC (VPC)
│ │ └── alias IP range: 172.20.100.0/24
│ └── cni0 bridge (172.20.100.1/24)
│ └── Windows/Linux VMs: 172.20.100.x (DHCP from alias range)
│
└── cocoonset-node-2 (GCE instance)
├── ens4: primary NIC (VPC)
│ └── alias IP range: 172.20.101.0/24
└── cni0 bridge (172.20.101.1/24)
└── VMs: 172.20.101.x
- GKE subnets support secondary IP ranges — named CIDR blocks that can be assigned as alias IPs to GCE instances.
- Each cocoonset node gets a
/24alias IP range (e.g.172.20.100.0/24) — all IPs in this range are VPC-routable to that instance. - cocoon-net assigns the alias IPs to its embedded DHCP server pool.
- VMs obtain IPs via DHCP from the embedded server; those IPs are within the VPC-routed alias range.
- Any GKE pod or node can reach VM IPs directly via L3 routing — no iptables DNAT needed.
Key caveat: The GCE guest agent installs a local route for the alias CIDR in the kernel's local routing table, which causes the host to respond to those IPs itself (blackholing VM traffic). cocoon-net removes this route and installs a cron job to remove it on reboot.
- GKE Standard cluster with
--enable-ip-alias - GCE instances with
--can-ip-forward - GCE instance service account with
roles/compute.networkAdmin(or equivalent) gcloudCLI installed on the node and authenticated (application default credentials or service account key)
cocoon-net uses a single secondary range named cocoon-pods on the node's GCE subnet as the pool from which per-instance /24 aliases are allocated. The range name is fixed; the CIDR must cover every node's --subnet.
For multi-node clusters, create this range before running cocoon-net init on any node:
gcloud compute networks subnets update <NODE_SUBNET> \
--region=<REGION> \
--add-secondary-ranges=cocoon-pods=172.20.0.0/16cocoon-net init then verifies that the existing range covers the caller's --subnet (e.g. 172.20.100.0/24) and reuses it. Running init without pre-creating the range still works for a single-node deployment — cocoon-net creates the range at the caller's --subnet, but a second node with a different --subnet will then fail fast with a clear diagnostic ("does not cover --subnet ...") instead of a cryptic gcloud error.
Teardown only removes this node's per-instance alias; the shared secondary range is preserved and must be removed by the operator.
sudo cocoon-net init \
--platform gke \
--node-name cocoon-pool \
--subnet 172.20.100.0/24 \
--pool-size 140 \
--dns 8.8.8.8,1.1.1.1This will:
- Detect the GKE platform via GCE metadata
- Verify the shared secondary range
cocoon-podson the node's subnet covers--subnet(creating it at--subnetif missing — see Prerequisites for multi-node) - Assign the alias IP
172.20.100.0/24tonic0of the instance - Remove the local route installed by the GCE guest agent
- Configure
cni0bridge, iptables, sysctl - Write CNI conflist to
/etc/cni/net.d/30-cocoon-dhcp.conflist - Save pool state to
/var/lib/cocoon/net/pool.json
After init, run cocoon-net daemon to start the embedded DHCP server. Host routes (/32) are added dynamically when VMs obtain leases.
For GKE nodes that were already provisioned by hand (alias IP range assigned, bridge configured), use adopt to bring them under cocoon-net management without calling any cloud APIs:
sudo cocoon-net adopt \
--platform gke \
--node-name cocoon-pool \
--subnet 172.20.100.0/24This configures bridge, CNI conflist, and sysctl from cocoon-net's templates, and writes the pool state file. The existing alias IP range is preserved. By default, existing iptables rules are also preserved — pass --manage-iptables to let cocoon-net rewrite them.
After adopting, run cocoon-net daemon to start DHCP. cocoon-net status and future re-runs of adopt work normally. On teardown, cocoon-net will attempt to remove the per-instance alias assuming it was provisioned from the shared cocoon-pods range (the AliasRangeName field in pool.json is empty for adopted nodes, so teardown falls back to that default); if the alias was bound from a differently-named range, teardown logs "alias not present, skipping" and the entry stays — remove it manually.
gcloud compute networks subnets update default \
--region=asia-southeast1 \
--add-secondary-ranges=cocoon-pods=172.20.0.0/16gcloud compute instances network-interfaces update cocoonset-node-1 \
--zone=asia-southeast1-b \
--network-interface=nic0 \
--aliases="cocoon-pods:172.20.100.0/24"The GCE guest agent installs local 172.20.100.0/24 dev ens4 table local which blackholes inbound VM traffic. Remove it:
ip route del local 172.20.100.0/24 dev ens4 table localInstall cron job to remove on reboot:
echo "@reboot root ip route del local 172.20.100.0/24 dev ens4 table local 2>/dev/null || true" \
> /etc/cron.d/cocoon-net-fix-aliasip link add cni0 type bridge 2>/dev/null || true
ip addr replace 172.20.100.1/24 dev cni0
ip link set cni0 upsysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.cni0.rp_filter=0
sysctl -w net.ipv4.conf.ens4.rp_filter=0# Allow VM traffic out via ens4 with MASQUERADE (internet access)
iptables -t nat -A POSTROUTING -s 172.20.100.0/24 ! -o cni0 -j MASQUERADE
# Allow cni0 forwarding
iptables -A FORWARD -i cni0 -o cni0 -j ACCEPTDHCP is provided by cocoon-net daemon (embedded server). No external DHCP server required. Host routes (/32) are managed dynamically on lease events.
# Start the daemon (or use systemd unit)
cocoon-net daemoncat > /etc/cni/net.d/30-cocoon-dhcp.conflist <<'EOF'
{
"cniVersion": "1.0.0",
"name": "cocoon-dhcp",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": false,
"ipMasq": false,
"ipam": {}
}
]
}
EOF| Range | Assignment |
|---|---|
| Primary subnet | GKE nodes, pods |
172.20.0.0/16 |
Secondary range "cocoon-pods" (whole range registered) |
172.20.100.0/24 |
cocoon-pool (node-1) alias IP + VM DHCP pool |
172.20.101.0/24 |
cocoon-pool-2 (node-2) alias IP + VM DHCP pool |
172.20.N.0/24 |
Future node-N |
| Resource | Limit |
|---|---|
| Alias IP ranges per NIC | 10 |
| IPs per alias /24 | 254 host IPs |
| Usable DHCP IPs (pool-size 140) | 140 |
Allow GKE master to reach vk-cocoon kubelet API (port 10250):
gcloud compute instances add-tags cocoonset-node-1 \
--zone=asia-southeast1-b --tags=cocoonset-node
MASTER_CIDR=$(gcloud container clusters describe <CLUSTER> \
--zone=<ZONE> --format='value(privateClusterConfig.masterIpv4CidrBlock)')
gcloud compute firewall-rules create allow-gke-master-to-vk \
--allow=tcp:10250 \
--source-ranges="${MASTER_CIDR}" \
--target-tags=cocoonset-node| Symptom | Cause | Fix |
|---|---|---|
| VM has IP but not reachable | GCE guest agent local route | ip route del local <cidr> dev ens4 table local |
| No DHCP lease | Daemon not running or pool mismatch | Check cocoon-net daemon logs |
| kubectl exec/logs timeout | Firewall blocks port 10250 | Add firewall rule for GKE master CIDR |
secondary range "cocoon-pods" ... does not cover --subnet |
Pre-existing cocoon-pods range is narrower than --subnet (typical when a previous single-node init created it at its own /24) |
Expand the shared range to cover --subnet, or choose a --subnet inside the existing range. See Prerequisites. |