Skip to content

Latest commit

 

History

History
226 lines (171 loc) · 8.51 KB

File metadata and controls

226 lines (171 loc) · 8.51 KB

GKE VPC-Native Networking for Cocoon VM Nodes

This guide covers how cocoon-net provisions VPC-native networking on GKE bare-metal/GCE instances so that Windows and Linux VMs running via cocoon are directly routable within the GKE VPC.

Architecture

GKE VPC (e.g. 10.0.0.0/8)
├── Primary subnet (GKE nodes, pods)
│   └── Secondary IP range: cocoon-pods=172.20.0.0/16
│
├── cocoonset-node-1 (GCE instance)
│   ├── ens4: primary NIC (VPC)
│   │   └── alias IP range: 172.20.100.0/24
│   └── cni0 bridge (172.20.100.1/24)
│       └── Windows/Linux VMs: 172.20.100.x (DHCP from alias range)
│
└── cocoonset-node-2 (GCE instance)
    ├── ens4: primary NIC (VPC)
    │   └── alias IP range: 172.20.101.0/24
    └── cni0 bridge (172.20.101.1/24)
        └── VMs: 172.20.101.x

How It Works

  1. GKE subnets support secondary IP ranges — named CIDR blocks that can be assigned as alias IPs to GCE instances.
  2. Each cocoonset node gets a /24 alias IP range (e.g. 172.20.100.0/24) — all IPs in this range are VPC-routable to that instance.
  3. cocoon-net assigns the alias IPs to its embedded DHCP server pool.
  4. VMs obtain IPs via DHCP from the embedded server; those IPs are within the VPC-routed alias range.
  5. Any GKE pod or node can reach VM IPs directly via L3 routing — no iptables DNAT needed.

Key caveat: The GCE guest agent installs a local route for the alias CIDR in the kernel's local routing table, which causes the host to respond to those IPs itself (blackholing VM traffic). cocoon-net removes this route and installs a cron job to remove it on reboot.

Prerequisites

  • GKE Standard cluster with --enable-ip-alias
  • GCE instances with --can-ip-forward
  • GCE instance service account with roles/compute.networkAdmin (or equivalent)
  • gcloud CLI installed on the node and authenticated (application default credentials or service account key)

Multi-node prerequisite: pre-create the shared secondary range

cocoon-net uses a single secondary range named cocoon-pods on the node's GCE subnet as the pool from which per-instance /24 aliases are allocated. The range name is fixed; the CIDR must cover every node's --subnet.

For multi-node clusters, create this range before running cocoon-net init on any node:

gcloud compute networks subnets update <NODE_SUBNET> \
  --region=<REGION> \
  --add-secondary-ranges=cocoon-pods=172.20.0.0/16

cocoon-net init then verifies that the existing range covers the caller's --subnet (e.g. 172.20.100.0/24) and reuses it. Running init without pre-creating the range still works for a single-node deployment — cocoon-net creates the range at the caller's --subnet, but a second node with a different --subnet will then fail fast with a clear diagnostic ("does not cover --subnet ...") instead of a cryptic gcloud error.

Teardown only removes this node's per-instance alias; the shared secondary range is preserved and must be removed by the operator.

Running cocoon-net init

sudo cocoon-net init \
  --platform gke \
  --node-name cocoon-pool \
  --subnet 172.20.100.0/24 \
  --pool-size 140 \
  --dns 8.8.8.8,1.1.1.1

This will:

  1. Detect the GKE platform via GCE metadata
  2. Verify the shared secondary range cocoon-pods on the node's subnet covers --subnet (creating it at --subnet if missing — see Prerequisites for multi-node)
  3. Assign the alias IP 172.20.100.0/24 to nic0 of the instance
  4. Remove the local route installed by the GCE guest agent
  5. Configure cni0 bridge, iptables, sysctl
  6. Write CNI conflist to /etc/cni/net.d/30-cocoon-dhcp.conflist
  7. Save pool state to /var/lib/cocoon/net/pool.json

After init, run cocoon-net daemon to start the embedded DHCP server. Host routes (/32) are added dynamically when VMs obtain leases.

Adopting existing nodes

For GKE nodes that were already provisioned by hand (alias IP range assigned, bridge configured), use adopt to bring them under cocoon-net management without calling any cloud APIs:

sudo cocoon-net adopt \
  --platform gke \
  --node-name cocoon-pool \
  --subnet 172.20.100.0/24

This configures bridge, CNI conflist, and sysctl from cocoon-net's templates, and writes the pool state file. The existing alias IP range is preserved. By default, existing iptables rules are also preserved — pass --manage-iptables to let cocoon-net rewrite them.

After adopting, run cocoon-net daemon to start DHCP. cocoon-net status and future re-runs of adopt work normally. On teardown, cocoon-net will attempt to remove the per-instance alias assuming it was provisioned from the shared cocoon-pods range (the AliasRangeName field in pool.json is empty for adopted nodes, so teardown falls back to that default); if the alias was bound from a differently-named range, teardown logs "alias not present, skipping" and the entry stays — remove it manually.

Manual Steps (for reference)

1. Add secondary IP range to subnet

gcloud compute networks subnets update default \
  --region=asia-southeast1 \
  --add-secondary-ranges=cocoon-pods=172.20.0.0/16

2. Assign alias IP to instance

gcloud compute instances network-interfaces update cocoonset-node-1 \
  --zone=asia-southeast1-b \
  --network-interface=nic0 \
  --aliases="cocoon-pods:172.20.100.0/24"

3. Fix GCE guest agent route hijack

The GCE guest agent installs local 172.20.100.0/24 dev ens4 table local which blackholes inbound VM traffic. Remove it:

ip route del local 172.20.100.0/24 dev ens4 table local

Install cron job to remove on reboot:

echo "@reboot root ip route del local 172.20.100.0/24 dev ens4 table local 2>/dev/null || true" \
  > /etc/cron.d/cocoon-net-fix-alias

4. Configure cni0 bridge

ip link add cni0 type bridge 2>/dev/null || true
ip addr replace 172.20.100.1/24 dev cni0
ip link set cni0 up

5. sysctl

sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.cni0.rp_filter=0
sysctl -w net.ipv4.conf.ens4.rp_filter=0

6. iptables

# Allow VM traffic out via ens4 with MASQUERADE (internet access)
iptables -t nat -A POSTROUTING -s 172.20.100.0/24 ! -o cni0 -j MASQUERADE
# Allow cni0 forwarding
iptables -A FORWARD -i cni0 -o cni0 -j ACCEPT

7. DHCP

DHCP is provided by cocoon-net daemon (embedded server). No external DHCP server required. Host routes (/32) are managed dynamically on lease events.

# Start the daemon (or use systemd unit)
cocoon-net daemon

8. CNI conflist

cat > /etc/cni/net.d/30-cocoon-dhcp.conflist <<'EOF'
{
  "cniVersion": "1.0.0",
  "name": "cocoon-dhcp",
  "plugins": [
    {
      "type": "bridge",
      "bridge": "cni0",
      "isGateway": false,
      "ipMasq": false,
      "ipam": {}
    }
  ]
}
EOF

IP Plan

Range Assignment
Primary subnet GKE nodes, pods
172.20.0.0/16 Secondary range "cocoon-pods" (whole range registered)
172.20.100.0/24 cocoon-pool (node-1) alias IP + VM DHCP pool
172.20.101.0/24 cocoon-pool-2 (node-2) alias IP + VM DHCP pool
172.20.N.0/24 Future node-N

Limits

Resource Limit
Alias IP ranges per NIC 10
IPs per alias /24 254 host IPs
Usable DHCP IPs (pool-size 140) 140

Firewall

Allow GKE master to reach vk-cocoon kubelet API (port 10250):

gcloud compute instances add-tags cocoonset-node-1 \
  --zone=asia-southeast1-b --tags=cocoonset-node

MASTER_CIDR=$(gcloud container clusters describe <CLUSTER> \
  --zone=<ZONE> --format='value(privateClusterConfig.masterIpv4CidrBlock)')

gcloud compute firewall-rules create allow-gke-master-to-vk \
  --allow=tcp:10250 \
  --source-ranges="${MASTER_CIDR}" \
  --target-tags=cocoonset-node

Troubleshooting

Symptom Cause Fix
VM has IP but not reachable GCE guest agent local route ip route del local <cidr> dev ens4 table local
No DHCP lease Daemon not running or pool mismatch Check cocoon-net daemon logs
kubectl exec/logs timeout Firewall blocks port 10250 Add firewall rule for GKE master CIDR
secondary range "cocoon-pods" ... does not cover --subnet Pre-existing cocoon-pods range is narrower than --subnet (typical when a previous single-node init created it at its own /24) Expand the shared range to cover --subnet, or choose a --subnet inside the existing range. See Prerequisites.