Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 114 additions & 88 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# KCP-aware Dependency Controller
# kcp-aware Dependency Controller

[![Build status](https://github.com/opendefensecloud/dependency-controller/actions/workflows/golang.yaml/badge.svg)](https://github.com/opendefensecloud/dependency-controller/actions/workflows/golang.yaml)
[![Go Report Card](https://goreportcard.com/badge/go.opendefense.cloud/dependency-controller)](https://goreportcard.com/report/go.opendefense.cloud/dependency-controller)
[![Go Reference](https://pkg.go.dev/badge/go.opendefense.cloud/dependency-controller.svg)](https://pkg.go.dev/go.opendefense.cloud/dependency-controller)
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/opendefensecloud/dependency-controller/badge)](https://scorecard.dev/viewer/?uri=github.com/opendefensecloud/dependency-controller)
[![GitHub Release](https://img.shields.io/github/v/release/opendefensecloud/dependency-controller)](https://github.com/opendefensecloud/dependency-controller/releases/latest)

## Problem Statement

In KCP, APIs can be offered to users via APIExports by a multitude of providers.
In kcp, APIs can be offered to users via APIExports by a multitude of providers.
For IaaS services however there is a critical shortcoming:
IaaS APIs typically depend on each other -- for example, a VM is provisioned in a VPC.
The VM is dependent on the VPC. If the VPC is deleted, it pulls the rug from under the VM.
Expand All @@ -21,13 +24,13 @@ flowchart TD
A["Provider creates<br/><b>DependencyRule</b><br/>(e.g. VM → VPC)"] --> B["Both binaries discover rule<br/>via dep-ctrl APIExport"]

B --> C["<b>Controller:</b><br/>Install ValidatingWebhook<br/>in dependency provider workspace"]
B --> E["<b>Webhook:</b><br/>Start indexed cache watching<br/>dependent type via APIExport VW"]
B --> E["<b>Webhook:</b><br/>Register rule metadata<br/>(dependent GVR, field paths)<br/>in RuleRegistry"]

E --> F["Informer indexes dependents<br/>by field paths<br/>(e.g. .spec.vpcRef.name)"]
E --> F["Registry holds rule metadata only<br/>— no cache of dependents"]

F --> G{"Consumer tries to delete<br/>dependency (e.g. VPC)"}
G --> H["Webhook intercepts DELETE"]
H --> I["Query indexed cache:<br/>any VMs where .spec.vpcRef.name = my-vpc?"]
H --> I["List VMs in consumer workspace<br/>via kcp front-proxy;<br/>in-memory filter:<br/>.spec.vpcRef.name == my-vpc?"]
I -- Yes --> J["Deny deletion<br/>'still referenced by VirtualMachine/my-vm'"]
I -- No --> K["Allow deletion"]

Expand All @@ -43,8 +46,9 @@ flowchart TD

Along with their APIExport, providers create `DependencyRule` objects to describe how their
resources depend on others. A single rule attaches to one dependent resource type (via its
APIExport reference) and lists all of its dependencies with field paths that describe where
the reference lives:
APIExport reference in the same workspace as the rule) and lists each dependency together
with the **dependency provider's** APIExport reference (workspace path + name) and the field
path inside the dependent resource where the reference lives:

```yaml
apiVersion: dependencies.opendefense.cloud/v1alpha1
Expand All @@ -57,13 +61,20 @@ spec:
group: compute.example.com
version: v1alpha1
kind: VirtualMachine
resource: virtualmachines
dependencies:
- group: network.example.com
- apiExportRef:
path: root:providers:network
name: network.example.com
group: network.example.com
version: v1alpha1
resource: vpcs
fieldRef:
path: ".spec.vpcRef.name"
- group: network.example.com
- apiExportRef:
path: root:providers:network
name: network.example.com
group: network.example.com
version: v1alpha1
resource: subnets
fieldRef:
Expand All @@ -76,32 +87,44 @@ The system runs as two binaries, deployed together via a single Helm chart, that
both watch `DependencyRule` objects via the dep-ctrl APIExport:

**Controller** (`cmd/controller`) -- handles infrastructure setup:

- Installs `ValidatingWebhookConfiguration` in each provider workspace whose
resources are protected as dependencies
- All provider workspace access goes through the dep-ctrl APIExport's virtual
workspace, authorized by `permissionClaims` on the APIExport
- Webhook management goes through the dep-ctrl APIExport's virtual workspace,
authorized by the `validatingwebhookconfigurations` `permissionClaim`.
Workspace-path resolution (translating `apiExportRef.path` into a logical
cluster name) goes through the kcp front-proxy directly, authorized by plain
RBAC on `tenancy.kcp.io/workspaces` plus a binding to the kcp-predefined
`system:kcp:workspace:access` ClusterRole.

**Webhook** (`cmd/webhook`) -- handles admission:
- Maintains a dedicated indexed cache per rule, watching the dependent resource
type via the provider's APIExport virtual workspace
- Serves admission requests, querying indexed caches to block deletion of
resources that are still referenced

### Indexed Cache
- Watches `DependencyRule` objects via the dep-ctrl APIExport's virtual
workspace and stores parsed metadata (dependent GVR + field paths) in an
in-memory `RuleRegistry`.
- On each DELETE admission request, finds matching rules in the registry,
lists dependent resources directly in the consumer workspace via the kcp
front-proxy, and filters in-memory by the configured field path to block
deletion of still-referenced resources.

### Rule Registry

For each DependencyRule, the webhook server starts a multicluster manager that watches the
dependent resource type (e.g., VirtualMachines) via the referenced APIExport's virtual
workspace. Field indices are registered on the dependent informer for each dependency
target's field path (e.g., `.spec.vpcRef.name`), enabling O(1) lookups by referenced
resource name.
The webhook keeps an in-memory `RuleRegistry` populated by reconciling
`DependencyRule` objects through the dep-ctrl APIExport's virtual workspace.
Each entry holds rule metadata only — the dependent's GroupVersionResource
and the field paths that hold dependency references — not the dependent
resources themselves. Dependent listing happens on demand per admission
request (see [Admission Webhook](#admission-webhook) below).

### Admission Webhook

A KCP ValidatingAdmissionWebhook intercepts DELETE requests. When a delete is attempted,
the webhook queries the indexed caches to find dependent resources that reference the
resource being deleted. If any are found, the request is denied with a clear error message
listing the dependents. Finalizers are intentionally avoided as they conflict with KCP's
sync-agent.
A kcp ValidatingAdmissionWebhook intercepts DELETE requests. When a delete is attempted,
the webhook looks up matching rules in the registry, builds a per-request dynamic client
targeting the consumer workspace via the kcp front-proxy
(`{base}/clusters/{logicalCluster}`), `List`s the dependent type, and filters the results
in-memory by the rule's field path. If any blocker is found, the request is denied with a
clear error message listing the dependents. Finalizers are intentionally avoided as they
conflict with kcp's sync-agent.

### Architecture

Expand All @@ -112,7 +135,7 @@ in those workspaces. Consumer workspaces do not need to bind to the dep-ctrl exp

```mermaid
graph LR
subgraph DC["Dep-Ctrl Workspace"]
subgraph DC["dep-ctrl Workspace"]
DCExport["APIExport:<br/>DependencyRule<br/><i>+ permissionClaims</i>"]
end

Expand All @@ -121,37 +144,38 @@ graph LR
end

subgraph WB["Webhook Binary"]
WH["Rule Cache Manager<br/>· Indexed Caches (per rule)<br/>· Deletion Validator"]
WH["DependencyRule Reconciler<br/>· Rule Registry (metadata)<br/>· Deletion Validator"]
end

subgraph CP["Compute Provider WS"]
subgraph CP["Compute Provider Workspace"]
CPBinding["APIBinding: dep-ctrl<br/><i>(claims accepted)</i>"]
CPExport["APIExport: compute"]
CPRule["DependencyRule:<br/>VM → VPC"]
end

subgraph NP["Network Provider WS"]
subgraph NP["Network Provider Workspace"]
NPBinding["APIBinding: dep-ctrl<br/><i>(claims accepted)</i>"]
NPExport["APIExport: VPCs"]
NPWebhook["ValidatingWebhook"]
end

subgraph ROOT["Root Workspace"]
ROOTROLE["ClusterRoles<br/>(workspaces/content +<br/>workspace resolution)"]
subgraph ROOT["Workspace-resolution RBAC<br/>(typical: root; alt: per-shard system:admin)"]
ROOTROLE["ClusterRole binding:<br/>tenancy.kcp.io/workspaces get,list,watch<br/>+ system:kcp:workspace:access"]
end

subgraph CW["Consumer WS"]
subgraph CW["Consumer Workspace"]
CWBindings["APIBindings:<br/>compute, network"]
CWResources["VPC, VM"]
end

CPBinding -->|binds to| DCExport
NPBinding -->|binds to| DCExport
Ctrl -.->|watches rules via VW| DCExport
Ctrl -.->|installs webhook via VW| NP
WH -.->|watches rules via VW| DCExport
WH -.->|watches VMs via| CPExport
Ctrl -.->|watches rules via virtual workspace| DCExport
Ctrl -.->|installs webhook via virtual workspace| NP
Ctrl -.->|resolves workspace paths<br/>via kcp front-proxy| ROOTROLE
WH -.->|watches rules via virtual workspace| DCExport
NPWebhook -.->|dispatches DELETE to| WH
WH -.->|on DELETE: lists dependents<br/>via kcp front-proxy| CW
CWBindings -->|binds to| CPExport
CWBindings -->|binds to| NPExport

Expand All @@ -164,27 +188,25 @@ graph LR
style CW fill:#fef3c7,color:#664d03
```

**Two levels of multicluster watching:**

1. **DependencyRule reconciler** (both binaries) watches rules via the dep-ctrl's own
APIExport virtual workspace, discovering provider workspaces that bind to the dep-ctrl
export.

2. **Indexed cache** (webhook only, dynamic per-rule) watches the dependent resource type
(e.g., VMs) via the referenced APIExport's virtual workspace. Field indices enable the
webhook to quickly find dependents referencing a given resource.
**Multicluster watching is one-level only:** both binaries watch
`DependencyRule` objects via the dep-ctrl APIExport's virtual workspace,
which spans every provider workspace bound to it. Dependent resources
(e.g., VMs) are not watched — the webhook lists them on demand from the
consumer workspace via the kcp front-proxy when validating a DELETE.

For detailed architecture documentation, see [docs/architecture.md](docs/architecture.md).
For a step-by-step deployment walkthrough, see [docs/getting-started.md](docs/getting-started.md).
For development setup and project layout, see [docs/development.md](docs/development.md).

### RBAC Model

The system uses static bootstrap RBAC in three kcp locations. No dynamic RBAC is
created at runtime.
The system relies on static bootstrap RBAC plus one `permissionClaim` declared
on the dep-ctrl APIExport. No dynamic RBAC is created at runtime.

#### permissionClaims on the dep-ctrl APIExport

The dep-ctrl APIExport declares a `permissionClaim` for:

- `validatingwebhookconfigurations` (admissionregistration.k8s.io) -- to install webhooks

Provider workspaces that bind to the dep-ctrl APIExport must **accept** this claim
Expand All @@ -193,47 +215,51 @@ in binding workspaces through the virtual workspace.

#### Bootstrap RBAC (static, applied at deployment)

**Root workspace** -- both components need `workspaces/content` access to enter child
workspaces. The controller additionally needs `workspaces` read access to resolve
workspace paths to logical cluster names.

**Dep-ctrl workspace** -- the controller needs `apiexportendpointslices` read access
for VW URL discovery and full CRUD on `apiexports/content` to manage webhooks in
binding workspaces via the VW.

No shard-wide RBAC is needed. The webhook watches dependent resources through the
dep-ctrl APIExport's virtual workspace, authorized by dynamically managed
permissionClaims. Providers accept these claims in their APIBinding.

See [docs/getting-started.md](docs/getting-started.md) for the full deployment guide
using [kcp-operator](https://github.com/kcp-dev/helm-charts).
Three categories of static RBAC must be in place at deployment time:

**Per-shard `system:admin` RBAC (webhook)** -- grants the webhook ServiceAccount
`*/*` `get,list`. The webhook needs this during admission to list dependent
resources directly in any consumer workspace via the kcp front-proxy. Because
kcp's `BootstrapPolicyAuthorizer` reads bindings from each shard's local
`system:admin` workspace and bindings do not propagate across shards, this
binding must be applied **once per kcp shard** through a direct (non-front-proxy)
connection.
Comment thread
BergCyrill marked this conversation as resolved.

**Workspace-resolution RBAC (controller)** -- the controller needs
`tenancy.kcp.io/workspaces` `get,list,watch` plus workspace-content access — the
canonical way is to bind the kcp-predefined `system:kcp:workspace:access`
ClusterRole, which grants the `access` verb on the non-resource URL `/`. Both
must be in place in every **parent** of a workspace the controller operates on.
The controller uses these rules to translate a `DependencyRule`'s
`apiExportRef.path` (e.g., `root:providers:network`) into the underlying logical
cluster name. In a typical deployment where provider workspaces live directly
under `root`, granting them in the `root` workspace is enough; deeper paths need
the same bindings in each intermediate parent. As an alternative, the bindings
may be applied in each shard's `system:admin` workspace — those cover every
workspace on the shard and implicitly satisfy any parent the resolver needs to
traverse, at the cost of (like the webhook binding above) being applied once per
shard.

**Dep-ctrl workspace RBAC (both components)** -- both binaries need
`apis.kcp.io/apiexportendpointslices` `get,list,watch` (to discover the dep-ctrl
APIExport's virtual-workspace URLs) and `apis.kcp.io/apiexports/content` on the
dep-ctrl APIExport. The controller uses the latter to manage
`ValidatingWebhookConfiguration` objects in binding workspaces through the
virtual workspace; the webhook uses it to watch `DependencyRule` objects through
the same virtual workspace.

Webhook installation in provider workspaces is authorized by the
`validatingwebhookconfigurations` permissionClaim above, not by RBAC. Dependent
listing during admission is authorized by the per-shard `system:admin` binding,
not by the dep-ctrl APIExport.

## Development

### Prerequisites

- Go 1.26+
- [kcp](https://github.com/kcp-dev/kcp) binary (for integration tests)
The fastest way to get a working dev environment is the [Nix flake](flake.nix)
together with [direnv](https://direnv.net/): `direnv allow` (or `nix develop`)
drops you into a shell with Go, `golangci-lint`, `helm`, `kind`, and the kcp
toolchain on `$PATH`. After that, `pre-commit install` registers the project's
hooks.

### Build

```sh
make build
```

### Run Tests

```sh
# Unit and integration tests (requires kcp binary)
make test

# E2E tests (requires kind, helm, docker)
# Deploys a multi-shard kcp via kcp-operator (root + shard1)
make test-e2e
```

### Generate Code

```sh
make generate
```
For project layout, the full `make` target reference, integration- and e2e-test
internals, and shard-config tips, see [docs/development.md](docs/development.md).
30 changes: 30 additions & 0 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,38 @@

## Prerequisites

The recommended setup is the [dev shell](#dev-shell) below — it provides the
full toolchain. If you're not using Nix, you need:

- Go 1.26+
- A kcp binary (downloaded automatically by `make kcp`)
- `golangci-lint`, `helm`, `kind`, `docker`, and `pre-commit` on `$PATH`

## Dev shell

The repo ships a [Nix flake](../flake.nix) wired up via [direnv](https://direnv.net/)
(see [.envrc](../.envrc)). With both installed, the dev shell auto-loads on `cd`
and provides Go 1.26.2, `golangci-lint`, `gopls`, `helm`, `kind`, `task`, the kcp
toolchain, and the rest of the dependencies.

```sh
direnv allow # one-time, on first entry
# or, without direnv:
nix develop
```

The shell is defined by [`opendefensecloud/dev-kit`](https://github.com/opendefensecloud/dev-kit)
via the `dev-kit` flake input — adding tools project-wide is a PR there, not here.

## Pre-commit hooks

```sh
pre-commit install # registers the hooks listed in .pre-commit-config.yaml
```

The configured hooks cover trailing whitespace, YAML/JSON syntax, `yamllint`,
`shellcheck`, `gofmt`, `go vet`, `go mod tidy`, `golangci-lint` (manual stage),
and `helm lint`.

## Project Structure

Expand Down