The data plane regulates itself. A rustlib app sized for steady state does not fall over when a burst arrives, an upstream stalls, or a transform balloons memory -- the pipeline slows its own intake, lets the in-flight work drain, and speeds back up once the pressure clears. This happens automatically; an app wires nothing.
Self-regulation is ON by default. To turn it off (byte-identical to the pre-governor data path), set one cascade key:
self_regulation:
enabled: falseWhen enabled = false the runtime constructs NOTHING -- no pressure
governor, no inbound gate, no byte-budget controller. Every Option stays
None and the data path is the original whole-batch loop. There is no
half-on state.
See also BACKPRESSURE.md (where the brake is applied and
why) and KAFKA-PATH.md (how the Kafka GET/PROCESS/SEND
batch sizes feed the loop). The code lives in src/governor/.
Self-regulation is how a rustlib app scales VERTICALLY -- a single pod adapting its OWN intake to its OWN resources. It is the DEFAULT and the FIRST response to pressure, and it is distinct from (and complementary to) HORIZONTAL scaling, where KEDA adds pods.
The doctrine, in one line: a pod sized for steady state regulates its own intake FIRST -- it slows down or speeds up WITHIN the pod -- and only escalates to horizontal scale (more pods) once its vertical headroom is exhausted. Vertical is the fast, local, free response; horizontal is the slow, global, capacity response. They are not alternatives. Vertical buys time; horizontal adds capacity when the time runs out.
Memory is the hard, never-OOM authority on the vertical side. CPU is left to CFS: under a CPU quota the Linux scheduler throttles the process, each batch takes longer, and the AIMD byte-budget loop sees the longer process time and shrinks the budget on its own. There is no separate CPU brake -- adding one would double-count a signal CFS already handles (see "Why memory is HARD and CPU is deliberately dropped" below).
Self-regulation is ON by default. Opt out with self_regulation.enabled = false, which builds nothing on the vertical side; horizontal scaling via
ScalingPressure is unaffected.
| Vertical (self-regulation) | Horizontal (KEDA) | |
|---|---|---|
| Question | "Should this pod pull more work right now?" | "Do we need more pods?" |
| Mechanism | MemoryGuard + UnifiedPressure inbound gate + AIMD byte budget | ScalingPressure -> external-scaler signal -> KEDA |
| Lives in | This library, in-pod | This library's signal; KEDA acts on it |
| Scope | One pod, its own intake | The whole deployment, replica count |
| Timescale | Milliseconds | Seconds to minutes (pod start) |
| Cost | Free -- no new capacity | A new pod |
| Default | ON | Driven by the same pressure / lag signal |
The two compose rather than compete. The SAME pressure that brakes a pod's
intake also surfaces as consumer lag (Kafka) or queue depth, which is the
signal ScalingPressure feeds to KEDA. Braking intake on one pod buys time
for the in-flight buffer to drain; sustained pressure that the brake cannot
clear is exactly the condition that should add a replica. See the
Pressure -> lag -> KEDA section in BACKPRESSURE.md and
pipeline/SCALING.md for the KEDA signal.
Self-regulation is three distinct controllers with three distinct jobs. They are NOT interchangeable; each answers a different question.
| Brain | Question | Acts on | Source of truth? |
|---|---|---|---|
| MemoryGuard | "Are we about to OOM?" | The HARD pressure signal | YES -- the never-OOM authority |
| ScalingPressure | "Do we need more pods?" | KEDA / external scaler signal | Pool sizing, not the data path |
| UnifiedPressure | "Should I pull more work right now?" | The inbound gate + byte budget | Derived from the sources above |
- MemoryGuard (
src/memory/) is the source of truth. It tracks in-flight ingress bytes (and, when a heap source is wired, the true process heap -- see theset_heap_sourceentry in MIGRATIONS.md). Itspressure_ratio()is fed into the governor as a HARD source: never weighted, never masked. A saturated soft signal can never lower the combined level below what memory demands. This is the never-OOM guarantee. - ScalingPressure (
src/scaling/) drives horizontal scaling. It emits the external-scaler signal KEDA reads to add or remove pods. It is a capacity lever, not a data-path lever -- it does not pause intake, it asks for more replicas. See pipeline/SCALING.md. - UnifiedPressure (
src/governor/source.rs) combines the sources into ONE normalised level in[0.0, 1.0]under a hysteretic latch. It is what the inbound gate and the byte-budget controller both consult. It owns no signal of its own -- it is the seam that turns the brains' readings into a single pause/resume decision.
Memory is the only resource that kills the process. Run out of CPU and the work merely runs slower; run out of memory and the kernel OOM-kills the pod and in-flight data is lost. So memory is the HARD source -- the one signal that always gets through.
CPU is deliberately NOT a pressure source:
- CFS self-corrects. Under a CPU quota the Linux scheduler throttles the process for us. A CPU-bound stage simply takes longer per batch; the byte-budget loop sees the longer process time and shrinks on its own. No separate CPU brake is needed -- adding one would double-count the same signal.
- CPU saturation surfaces as lag, and lag is KEDA's job. A pod that cannot keep up grows consumer lag; KEDA reads the lag and adds a replica. Horizontal scale is the right answer to "not enough CPU", not pausing intake on the one pod that is already maxed.
The seam is built to accept a CPU source LATER with zero API change.
UnifiedPressure::add_source takes any PressureSource; a future CPU
source would plug in as a SOFT, weighted source and every existing caller
of level() / should_hold() is untouched. The decision to drop CPU is a
default, not a wall.
The HARD memory source reads the container's own cgroup, not host used/total
(on a shared node the host figure is unrelated to this pod's limit). Three
cgroup v2 inputs, container-first:
memory.max-- the hard ceiling. Cross it and the kernel OOM-kills the pod. Base pressure ratio ismemory.current / memory.max.memory.high-- the soft throttle. The kernel reclaims hard and throttles allocations here, BEFORE the OOM-kill, so the signal takes the WORST ofcurrent/maxandcurrent/high-- shedding before the throttle's latency cliff, not just before the kill. TheMemoryGuardlikewise caps its admission limit atmemory.highwhen that is belowmax * headroom.memory.pressure(PSIsome avg10) -- the earliest signal: the fraction of the last 10s in which a task stalled on memory. Emitted as theworker_pool_memory_psi_somegauge for observability/alerting. NOT folded into the shed decision -- the actionable stall-percent is workload-specific and wants per-service calibration, not a guessed constant.
flowchart TD
MG["MemoryGuard.pressure_ratio()"] -->|HARD| UP["UnifiedPressure.level()"]
UP --> IG["InboundGate<br/>(pause/resume the SOURCE)"]
UP --> BB["ByteBudgetController<br/>(AIMD lever -> sub-block size)"]
- InboundGate (
src/governor/gate.rs) turns the latch into EDGE events:pause()once on the rising edge,resume()once on the falling edge. It pauses the inbound SOURCE (stops pulling new work) -- never the outbound drain. See BACKPRESSURE.md for why gating the drain deadlocks. - ByteBudgetController (
src/governor/budget.rs) is an AIMD (additive-increase / multiplicative-decrease) lever that sizes the inbound byte budget for a target utilisationrho ~= 0.7. Slack grows the budget additively; falling behind shrinks it multiplicatively; a memory HARD override shrinks IMMEDIATELY regardless ofrho. See KAFKA-PATH.md for the full AIMD description and the PROCESS byte-budget's place among the three Kafka batch sizes.
The controller starts BIG (start_bytes) and lets the decrease loop find
the level -- a cold pipeline is never artificially throttled. While pressure
is LOW the budget sits at its big start value, so a received block becomes a
SINGLE sub-block with no per-record overhead: behaviour matches the
whole-batch loop. Near-zero cost off-pressure.
The governed driver (BatchEngine::run_governed) is the run path a
self-regulating app calls. It dispatches on whether the byte budget is wired:
budget present -> stream in sub-blocks sized to the current budget and fold
each block's (bytes, process_time, ingest_interval) into the AIMD loop;
budget absent (governor off) -> delegate verbatim to run_workbatch,
byte-identical to pre-governor behaviour. The streaming sub-block mechanics
live in BACKPRESSURE.md.
Self-regulation is visible, not mysterious. When throttling happens you can see it.
| Signal | Kind | Meaning |
|---|---|---|
self_regulation_inbound_paused |
gauge (0/1) | The inbound gate is currently holding (1) or open (0). Carries a source label (e.g. kafka, http) so two governed receivers on one pod are told apart |
self_regulation_inbound_pauses_total |
counter | Number of pause EDGES (rising transitions), not per-evaluate noise. Carries the same source label |
self_regulation_byte_budget |
gauge | Current AIMD byte budget (the inbound block-size lever) |
self_regulation_recv_block_bytes |
gauge | Actual bytes of the most recent received block (reality, against which the budget is the intent) |
self_regulation_pressure_ratio |
gauge | Combined UnifiedPressure.level() in [0, 1] |
self_regulation_kafka_gate_errors_total |
counter | Kafka pause/resume actuator failures, op label = pause or resume. A sustained non-zero rate means the brake is silently disabled for the Kafka source -- alert on it |
Because the gate fires each edge EXACTLY ONCE (ObservingActuator in
src/governor/gate.rs), the self_regulation_inbound_paused gauge and the
self_regulation_inbound_pauses_total counter track real transitions, not
per-evaluate noise. The gate also logs a brake-reason line on each edge:
WARN self-regulation: inbound PAUSED under pressure (memory/back-pressure brake) source=kafka
INFO self-regulation: inbound RESUMED, pressure cleared source=kafka
A pause without a matching resume in the logs means the pressure has not cleared -- check the memory guard and consumer lag.
All tuning is via the self_regulation cascade section (8-layer cascade,
hot-reload, /config admin endpoint -- same as every other config section;
see core-pillars/CONFIG.md).
self_regulation:
enabled: true # master switch (default true)
profile: throughput # throughput | balanced | low_latency -- sizes the AIMD envelope
pause_above: 0.80 # arm the inbound hold when combined pressure reaches this
resume_below: 0.65 # release the hold when pressure drops to this (must be < pause_above)
target_rho: 0.7 # target utilisation for the byte-budget AIMD loop, in (0, 1)
md_factor: 0.5 # multiplicative-decrease factor, in (0, 1)enabled-- the only knob most apps touch.falsebuilds nothing.profile-- sizes the AIMD byte-budget envelope (start / ceiling / step / record cap).throughputstarts big with a high ceiling;low_latencystarts small so blocks stay small and bursty. It mirrors the KafkaSelfRegulationProfilenames so one value reads the same regardless of transport.pause_above/resume_below-- the hysteresis band. The gap between them prevents flapping: the latch arms atpause_above, releases atresume_below, and holds its state in between. An inverted or non-finite band falls back to the defaults (0.80/0.65) with a warning rather than wedging the governor.target_rho-- how busy to keep the stage. Lower means more headroom (safer under bursts); higher means tighter packing (more efficient, riskier).md_factor-- how hard to brake when behind or under memory pressure.0.5halves the budget per decrease step.
Every default is set so an app that configures nothing gets a fully working, default-ON governor. Bad knobs are sanitised, not fatal.
The default profile is throughput, which starts with a large byte budget
(start-big, back-off-on-pressure). That is the right call for a PB/day
ingest pod with headroom, but on a small or memory-tight pod it can spike
in the COLD-START WINDOW -- the very first block is sized to the start
budget before the AIMD loop or the memory-hard override has seen any
pressure to react to. The governor self-corrects after that first block
(the memory-hard override drops the budget the moment in-flight bytes climb
toward the limit), so this is a transient first-block spike, not a steady
state.
There is deliberately NO dedicated "small-pod" preset (YAGNI). For a small/memory-tight pod, do one of:
- Set a lower start budget directly under
self_regulation(cap the first-block size so the cold-start window cannot overshoot the pod's memory limit). - Use the
balancedorlow_latencyprofile -- both start with a smaller byte-budget envelope, so the cold-start first block is correspondingly smaller.
Either way the memory-hard override remains the never-OOM backstop; the profile/start-budget choice only governs how large that first pre-feedback block is.
The in-process logical never-OOM test asserts the governor's control loop never lets in-flight bytes exceed the configured limit. It does NOT prove the process survives a real OS-level cgroup OOM-killer under a hard container memory limit. The real test -- a memory-limited container under sustained load, asserting NO cgroup OOM-kill where an ungoverned pipeline would be killed -- is a RELEASE-CHECKLIST / CI-harness item, run out of process against a real cgroup. It is not covered by the in-process unit tests and must be exercised separately before a release.