From 784545011d141a6d55f728bbbcbb40bc22f20450 Mon Sep 17 00:00:00 2001 From: Cyrill Berg Date: Fri, 8 May 2026 15:48:55 +0200 Subject: [PATCH 1/2] docs: update README and development documentation to reflect implementation and toolchain setup Signed-off-by: Cyrill Berg --- README.md | 126 ++++++++++++++++++++++---------------------- docs/development.md | 30 +++++++++++ 2 files changed, 92 insertions(+), 64 deletions(-) diff --git a/README.md b/README.md index 29c6b55..b2a865c 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,14 @@ -# KCP-aware Dependency Controller +# kcp-aware Dependency Controller +[![Build status](https://github.com/opendefensecloud/dependency-controller/actions/workflows/golang.yaml/badge.svg)](https://github.com/opendefensecloud/dependency-controller/actions/workflows/golang.yaml) +[![Go Report Card](https://goreportcard.com/badge/go.opendefense.cloud/dependency-controller)](https://goreportcard.com/report/go.opendefense.cloud/dependency-controller) +[![Go Reference](https://pkg.go.dev/badge/go.opendefense.cloud/dependency-controller.svg)](https://pkg.go.dev/go.opendefense.cloud/dependency-controller) [![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/opendefensecloud/dependency-controller/badge)](https://scorecard.dev/viewer/?uri=github.com/opendefensecloud/dependency-controller) [![GitHub Release](https://img.shields.io/github/v/release/opendefensecloud/dependency-controller)](https://github.com/opendefensecloud/dependency-controller/releases/latest) ## Problem Statement -In KCP, APIs can be offered to users via APIExports by a multitude of providers. +In kcp, APIs can be offered to users via APIExports by a multitude of providers. For IaaS services however there is a critical shortcoming: IaaS APIs typically depend on each other -- for example, a VM is provisioned in a VPC. The VM is dependent on the VPC. If the VPC is deleted, it pulls the rug from under the VM. @@ -21,13 +24,13 @@ flowchart TD A["Provider creates
DependencyRule
(e.g. VM → VPC)"] --> B["Both binaries discover rule
via dep-ctrl APIExport"] B --> C["Controller:
Install ValidatingWebhook
in dependency provider workspace"] - B --> E["Webhook:
Start indexed cache watching
dependent type via APIExport VW"] + B --> E["Webhook:
Register rule metadata
(dependent GVR, field paths)
in RuleRegistry"] - E --> F["Informer indexes dependents
by field paths
(e.g. .spec.vpcRef.name)"] + E --> F["Registry holds rule metadata only
— no cache of dependents"] F --> G{"Consumer tries to delete
dependency (e.g. VPC)"} G --> H["Webhook intercepts DELETE"] - H --> I["Query indexed cache:
any VMs where .spec.vpcRef.name = my-vpc?"] + H --> I["List VMs in consumer workspace
via kcp front-proxy;
in-memory filter:
.spec.vpcRef.name == my-vpc?"] I -- Yes --> J["Deny deletion
'still referenced by VirtualMachine/my-vm'"] I -- No --> K["Allow deletion"] @@ -43,8 +46,9 @@ flowchart TD Along with their APIExport, providers create `DependencyRule` objects to describe how their resources depend on others. A single rule attaches to one dependent resource type (via its -APIExport reference) and lists all of its dependencies with field paths that describe where -the reference lives: +APIExport reference in the same workspace as the rule) and lists each dependency together +with the **dependency provider's** APIExport reference (workspace path + name) and the field +path inside the dependent resource where the reference lives: ```yaml apiVersion: dependencies.opendefense.cloud/v1alpha1 @@ -57,13 +61,20 @@ spec: group: compute.example.com version: v1alpha1 kind: VirtualMachine + resource: virtualmachines dependencies: - - group: network.example.com + - apiExportRef: + path: root:providers:network + name: network.example.com + group: network.example.com version: v1alpha1 resource: vpcs fieldRef: path: ".spec.vpcRef.name" - - group: network.example.com + - apiExportRef: + path: root:providers:network + name: network.example.com + group: network.example.com version: v1alpha1 resource: subnets fieldRef: @@ -76,32 +87,40 @@ The system runs as two binaries, deployed together via a single Helm chart, that both watch `DependencyRule` objects via the dep-ctrl APIExport: **Controller** (`cmd/controller`) -- handles infrastructure setup: + - Installs `ValidatingWebhookConfiguration` in each provider workspace whose resources are protected as dependencies - All provider workspace access goes through the dep-ctrl APIExport's virtual workspace, authorized by `permissionClaims` on the APIExport **Webhook** (`cmd/webhook`) -- handles admission: -- Maintains a dedicated indexed cache per rule, watching the dependent resource - type via the provider's APIExport virtual workspace -- Serves admission requests, querying indexed caches to block deletion of - resources that are still referenced -### Indexed Cache +- Watches `DependencyRule` objects via the dep-ctrl APIExport's virtual + workspace and stores parsed metadata (dependent GVR + field paths) in an + in-memory `RuleRegistry`. +- On each DELETE admission request, finds matching rules in the registry, + lists dependent resources directly in the consumer workspace via the kcp + front-proxy, and filters in-memory by the configured field path to block + deletion of still-referenced resources. + +### Rule Registry -For each DependencyRule, the webhook server starts a multicluster manager that watches the -dependent resource type (e.g., VirtualMachines) via the referenced APIExport's virtual -workspace. Field indices are registered on the dependent informer for each dependency -target's field path (e.g., `.spec.vpcRef.name`), enabling O(1) lookups by referenced -resource name. +The webhook keeps an in-memory `RuleRegistry` populated by reconciling +`DependencyRule` objects through the dep-ctrl APIExport's virtual workspace. +Each entry holds rule metadata only — the dependent's GroupVersionResource +and the field paths that hold dependency references — not the dependent +resources themselves. Dependent listing happens on demand per admission +request (see [Admission Webhook](#admission-webhook) below). ### Admission Webhook -A KCP ValidatingAdmissionWebhook intercepts DELETE requests. When a delete is attempted, -the webhook queries the indexed caches to find dependent resources that reference the -resource being deleted. If any are found, the request is denied with a clear error message -listing the dependents. Finalizers are intentionally avoided as they conflict with KCP's -sync-agent. +A kcp ValidatingAdmissionWebhook intercepts DELETE requests. When a delete is attempted, +the webhook looks up matching rules in the registry, builds a per-request dynamic client +targeting the consumer workspace via the kcp front-proxy +(`{base}/clusters/{logicalCluster}`), `List`s the dependent type, and filters the results +in-memory by the rule's field path. If any blocker is found, the request is denied with a +clear error message listing the dependents. Finalizers are intentionally avoided as they +conflict with kcp's sync-agent. ### Architecture @@ -121,16 +140,16 @@ graph LR end subgraph WB["Webhook Binary"] - WH["Rule Cache Manager
· Indexed Caches (per rule)
· Deletion Validator"] + WH["DependencyRule Reconciler
· Rule Registry (metadata)
· Deletion Validator"] end - subgraph CP["Compute Provider WS"] + subgraph CP["Compute Provider Workspace"] CPBinding["APIBinding: dep-ctrl
(claims accepted)"] CPExport["APIExport: compute"] CPRule["DependencyRule:
VM → VPC"] end - subgraph NP["Network Provider WS"] + subgraph NP["Network Provider Workspace"] NPBinding["APIBinding: dep-ctrl
(claims accepted)"] NPExport["APIExport: VPCs"] NPWebhook["ValidatingWebhook"] @@ -140,7 +159,7 @@ graph LR ROOTROLE["ClusterRoles
(workspaces/content +
workspace resolution)"] end - subgraph CW["Consumer WS"] + subgraph CW["Consumer Workspace"] CWBindings["APIBindings:
compute, network"] CWResources["VPC, VM"] end @@ -150,8 +169,8 @@ graph LR Ctrl -.->|watches rules via VW| DCExport Ctrl -.->|installs webhook via VW| NP WH -.->|watches rules via VW| DCExport - WH -.->|watches VMs via| CPExport NPWebhook -.->|dispatches DELETE to| WH + WH -.->|on DELETE: lists dependents
via kcp front-proxy| CW CWBindings -->|binds to| CPExport CWBindings -->|binds to| NPExport @@ -164,18 +183,15 @@ graph LR style CW fill:#fef3c7,color:#664d03 ``` -**Two levels of multicluster watching:** - -1. **DependencyRule reconciler** (both binaries) watches rules via the dep-ctrl's own - APIExport virtual workspace, discovering provider workspaces that bind to the dep-ctrl - export. - -2. **Indexed cache** (webhook only, dynamic per-rule) watches the dependent resource type - (e.g., VMs) via the referenced APIExport's virtual workspace. Field indices enable the - webhook to quickly find dependents referencing a given resource. +**Multicluster watching is one-level only:** both binaries watch +`DependencyRule` objects via the dep-ctrl APIExport's virtual workspace, +which spans every provider workspace bound to it. Dependent resources +(e.g., VMs) are not watched — the webhook lists them on demand from the +consumer workspace via the kcp front-proxy when validating a DELETE. For detailed architecture documentation, see [docs/architecture.md](docs/architecture.md). For a step-by-step deployment walkthrough, see [docs/getting-started.md](docs/getting-started.md). +For development setup and project layout, see [docs/development.md](docs/development.md). ### RBAC Model @@ -185,6 +201,7 @@ created at runtime. #### permissionClaims on the dep-ctrl APIExport The dep-ctrl APIExport declares a `permissionClaim` for: + - `validatingwebhookconfigurations` (admissionregistration.k8s.io) -- to install webhooks Provider workspaces that bind to the dep-ctrl APIExport must **accept** this claim @@ -210,30 +227,11 @@ using [kcp-operator](https://github.com/kcp-dev/helm-charts). ## Development -### Prerequisites - -- Go 1.26+ -- [kcp](https://github.com/kcp-dev/kcp) binary (for integration tests) +The fastest way to get a working dev environment is the [Nix flake](flake.nix) +together with [direnv](https://direnv.net/): `direnv allow` (or `nix develop`) +drops you into a shell with Go, `golangci-lint`, `helm`, `kind`, and the kcp +toolchain on `$PATH`. After that, `pre-commit install` registers the project's +hooks. -### Build - -```sh -make build -``` - -### Run Tests - -```sh -# Unit and integration tests (requires kcp binary) -make test - -# E2E tests (requires kind, helm, docker) -# Deploys a multi-shard kcp via kcp-operator (root + shard1) -make test-e2e -``` - -### Generate Code - -```sh -make generate -``` +For project layout, the full `make` target reference, integration- and e2e-test +internals, and shard-config tips, see [docs/development.md](docs/development.md). diff --git a/docs/development.md b/docs/development.md index 1dbf002..d48ba65 100644 --- a/docs/development.md +++ b/docs/development.md @@ -2,8 +2,38 @@ ## Prerequisites +The recommended setup is the [dev shell](#dev-shell) below — it provides the +full toolchain. If you're not using Nix, you need: + - Go 1.26+ - A kcp binary (downloaded automatically by `make kcp`) +- `golangci-lint`, `helm`, `kind`, `docker`, and `pre-commit` on `$PATH` + +## Dev shell + +The repo ships a [Nix flake](../flake.nix) wired up via [direnv](https://direnv.net/) +(see [.envrc](../.envrc)). With both installed, the dev shell auto-loads on `cd` +and provides Go 1.26.2, `golangci-lint`, `gopls`, `helm`, `kind`, `task`, the kcp +toolchain, and the rest of the dependencies. + +```sh +direnv allow # one-time, on first entry +# or, without direnv: +nix develop +``` + +The shell is defined by [`opendefensecloud/dev-kit`](https://github.com/opendefensecloud/dev-kit) +via the `dev-kit` flake input — adding tools project-wide is a PR there, not here. + +## Pre-commit hooks + +```sh +pre-commit install # registers the hooks listed in .pre-commit-config.yaml +``` + +The configured hooks cover trailing whitespace, YAML/JSON syntax, `yamllint`, +`shellcheck`, `gofmt`, `go vet`, `go mod tidy`, `golangci-lint` (manual stage), +and `helm lint`. ## Project Structure From dd19de883660d27b404b8c481761c7015576848d Mon Sep 17 00:00:00 2001 From: Cyrill Berg Date: Fri, 8 May 2026 16:40:57 +0200 Subject: [PATCH 2/2] docs: enhance README with detailed RBAC and webhook management explanations Signed-off-by: Cyrill Berg --- README.md | 76 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 52 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index b2a865c..4813b90 100644 --- a/README.md +++ b/README.md @@ -90,8 +90,12 @@ both watch `DependencyRule` objects via the dep-ctrl APIExport: - Installs `ValidatingWebhookConfiguration` in each provider workspace whose resources are protected as dependencies -- All provider workspace access goes through the dep-ctrl APIExport's virtual - workspace, authorized by `permissionClaims` on the APIExport +- Webhook management goes through the dep-ctrl APIExport's virtual workspace, + authorized by the `validatingwebhookconfigurations` `permissionClaim`. + Workspace-path resolution (translating `apiExportRef.path` into a logical + cluster name) goes through the kcp front-proxy directly, authorized by plain + RBAC on `tenancy.kcp.io/workspaces` plus a binding to the kcp-predefined + `system:kcp:workspace:access` ClusterRole. **Webhook** (`cmd/webhook`) -- handles admission: @@ -131,7 +135,7 @@ in those workspaces. Consumer workspaces do not need to bind to the dep-ctrl exp ```mermaid graph LR - subgraph DC["Dep-Ctrl Workspace"] + subgraph DC["dep-ctrl Workspace"] DCExport["APIExport:
DependencyRule
+ permissionClaims"] end @@ -155,8 +159,8 @@ graph LR NPWebhook["ValidatingWebhook"] end - subgraph ROOT["Root Workspace"] - ROOTROLE["ClusterRoles
(workspaces/content +
workspace resolution)"] + subgraph ROOT["Workspace-resolution RBAC
(typical: root; alt: per-shard system:admin)"] + ROOTROLE["ClusterRole binding:
tenancy.kcp.io/workspaces get,list,watch
+ system:kcp:workspace:access"] end subgraph CW["Consumer Workspace"] @@ -166,9 +170,10 @@ graph LR CPBinding -->|binds to| DCExport NPBinding -->|binds to| DCExport - Ctrl -.->|watches rules via VW| DCExport - Ctrl -.->|installs webhook via VW| NP - WH -.->|watches rules via VW| DCExport + Ctrl -.->|watches rules via virtual workspace| DCExport + Ctrl -.->|installs webhook via virtual workspace| NP + Ctrl -.->|resolves workspace paths
via kcp front-proxy| ROOTROLE + WH -.->|watches rules via virtual workspace| DCExport NPWebhook -.->|dispatches DELETE to| WH WH -.->|on DELETE: lists dependents
via kcp front-proxy| CW CWBindings -->|binds to| CPExport @@ -195,8 +200,8 @@ For development setup and project layout, see [docs/development.md](docs/develop ### RBAC Model -The system uses static bootstrap RBAC in three kcp locations. No dynamic RBAC is -created at runtime. +The system relies on static bootstrap RBAC plus one `permissionClaim` declared +on the dep-ctrl APIExport. No dynamic RBAC is created at runtime. #### permissionClaims on the dep-ctrl APIExport @@ -210,20 +215,43 @@ in binding workspaces through the virtual workspace. #### Bootstrap RBAC (static, applied at deployment) -**Root workspace** -- both components need `workspaces/content` access to enter child -workspaces. The controller additionally needs `workspaces` read access to resolve -workspace paths to logical cluster names. - -**Dep-ctrl workspace** -- the controller needs `apiexportendpointslices` read access -for VW URL discovery and full CRUD on `apiexports/content` to manage webhooks in -binding workspaces via the VW. - -No shard-wide RBAC is needed. The webhook watches dependent resources through the -dep-ctrl APIExport's virtual workspace, authorized by dynamically managed -permissionClaims. Providers accept these claims in their APIBinding. - -See [docs/getting-started.md](docs/getting-started.md) for the full deployment guide -using [kcp-operator](https://github.com/kcp-dev/helm-charts). +Three categories of static RBAC must be in place at deployment time: + +**Per-shard `system:admin` RBAC (webhook)** -- grants the webhook ServiceAccount +`*/*` `get,list`. The webhook needs this during admission to list dependent +resources directly in any consumer workspace via the kcp front-proxy. Because +kcp's `BootstrapPolicyAuthorizer` reads bindings from each shard's local +`system:admin` workspace and bindings do not propagate across shards, this +binding must be applied **once per kcp shard** through a direct (non-front-proxy) +connection. + +**Workspace-resolution RBAC (controller)** -- the controller needs +`tenancy.kcp.io/workspaces` `get,list,watch` plus workspace-content access — the +canonical way is to bind the kcp-predefined `system:kcp:workspace:access` +ClusterRole, which grants the `access` verb on the non-resource URL `/`. Both +must be in place in every **parent** of a workspace the controller operates on. +The controller uses these rules to translate a `DependencyRule`'s +`apiExportRef.path` (e.g., `root:providers:network`) into the underlying logical +cluster name. In a typical deployment where provider workspaces live directly +under `root`, granting them in the `root` workspace is enough; deeper paths need +the same bindings in each intermediate parent. As an alternative, the bindings +may be applied in each shard's `system:admin` workspace — those cover every +workspace on the shard and implicitly satisfy any parent the resolver needs to +traverse, at the cost of (like the webhook binding above) being applied once per +shard. + +**Dep-ctrl workspace RBAC (both components)** -- both binaries need +`apis.kcp.io/apiexportendpointslices` `get,list,watch` (to discover the dep-ctrl +APIExport's virtual-workspace URLs) and `apis.kcp.io/apiexports/content` on the +dep-ctrl APIExport. The controller uses the latter to manage +`ValidatingWebhookConfiguration` objects in binding workspaces through the +virtual workspace; the webhook uses it to watch `DependencyRule` objects through +the same virtual workspace. + +Webhook installation in provider workspaces is authorized by the +`validatingwebhookconfigurations` permissionClaim above, not by RBAC. Dependent +listing during admission is authorized by the per-shard `system:admin` binding, +not by the dep-ctrl APIExport. ## Development