Skip to content

refactor: deduplicate L4/L7 policy logs — highest layer wins #362

@johntmyers

Description

@johntmyers

Problem Statement

When a network connection targets an endpoint with L7 policy rules, the sandbox proxy emits both a CONNECT (L4) log entry and one or more L7_REQUEST log entries for the same connection. The L4 CONNECT entry is always action=allow for L7-inspected connections, making it look like an independent policy decision when it's really just a tunnel lifecycle event. This creates confusion for operators — the same connection appears to have two policy decisions at different layers. The log message type should distinguish between a standalone L4 policy decision and a tunnel-open event that precedes L7 inspection.

Technical Context

The sandbox proxy handles CONNECT requests in a two-phase pipeline: first L4 evaluation (host:port + binary identity via OPA network_action rule), then optional L7 inspection (HTTP method/path via OPA allow_request rule). Both phases independently emit structured tracing::info! log lines, which are captured by LogPushLayer and pushed to the server via gRPC PushSandboxLogs. The TUI renders these as distinct log line types with separate field orderings. The L7 OPA rule is a strict superset of L4 — it re-evaluates endpoint_allowed AND binary_allowed before additionally checking request_allowed_for_endpoint.

A key constraint is that a single CONNECT tunnel can carry many HTTP requests (keep-alive), creating a 1:N relationship between L4 connections and L7 request logs. Suppressing the CONNECT log entirely would lose the "connection opened" lifecycle event, which has value as context for the L7 requests that follow.

Affected Components

Component Key Files Role
Proxy crates/openshell-sandbox/src/proxy.rs L4 CONNECT evaluation + logging, L7 relay dispatch
L7 Relay crates/openshell-sandbox/src/l7/relay.rs L7 per-request evaluation + logging
OPA Engine crates/openshell-sandbox/src/opa.rs Policy evaluation for both L4 and L7
Rego Policy crates/openshell-sandbox/data/sandbox-policy.rego Rule definitions (network_action, allow_request)
Log Push crates/openshell-sandbox/src/log_push.rs Captures tracing spans and pushes to server
Denial Aggregator crates/openshell-sandbox/src/denial_aggregator.rs Aggregates denial events for policy recommendation
TUI Logs crates/openshell-tui/src/ui/sandbox_logs.rs Renders L4 and L7 log lines with different field layouts

Technical Investigation

Architecture Overview

The proxy's handle_tcp_connection function (proxy.rs) processes each CONNECT request through:

  1. L4 evaluation (evaluate_opa_tcp(), line 340-347) — resolves process identity via /proc/net/tcp, evaluates OPA network_action rule
  2. L4 logging ("CONNECT" info!, lines 385-400) — always emitted, regardless of whether L7 follows
  3. L7 config query (query_l7_config(), line 501) — checks if endpoint has L7 protocol config
  4. If L7 configured → relay_with_inspection() (lines 544/600)
  5. L7 per-request evaluation (evaluate_l7_request(), relay.rs line 114) — evaluates OPA allow_request rule
  6. L7 logging ("L7_REQUEST" info!, relay.rs lines 123-133) — emitted per HTTP request in the tunnel

The OPA Rego rules confirm L7 is a superset of L4:

  • L4: allow_network checks endpoint_allowed + binary_allowed (rego lines 18-20)
  • L7: allow_request checks endpoint_allowed + binary_allowed + request_allowed_for_endpoint (rego lines 160-173)

Code References

Location Description
proxy.rs:385-400 L4 "CONNECT" log emission — always fires, even for L7-inspected connections
proxy.rs:340-347 evaluate_opa_tcp() call for L4 decision
proxy.rs:500-501 query_l7_config() — determines if L7 inspection is needed
proxy.rs:544,600 relay_with_inspection() dispatch for L7
l7/relay.rs:114 evaluate_l7_request() call for L7 decision
l7/relay.rs:116-120 L7 decision string mapping (allow/audit/deny)
l7/relay.rs:123-133 L7 "L7_REQUEST" log emission
l7/relay.rs:19-34 L7EvalContext struct — carries L4 context into L7
opa.rs:32-36 NetworkAction enum (Allow/Deny)
sandbox-policy.rego:149-154 network_action L4 rule
sandbox-policy.rego:160-173 allow_request L7 rule (superset of L4)
denial_aggregator.rs:20-37 DenialEvent struct — note L7 relay does NOT emit these
sandbox_logs.rs:321-348 TUI field orderings for CONNECT vs L7 log types

Current Behavior

For an L7-configured endpoint (e.g., api.github.com:443 with REST rules):

INFO CONNECT      action=allow  dst_host=api.github.com  dst_port=443  policy=github_api  ...
INFO L7_REQUEST   l7_decision=allow  l7_action=GET     l7_target=/repos/org/repo  dst_host=api.github.com  ...
INFO L7_REQUEST   l7_decision=deny   l7_action=DELETE  l7_target=/repos/org/repo  dst_host=api.github.com  ...

The CONNECT action=allow entry looks like a policy decision but is misleading — it will always be allow for any connection that reaches L7. The real policy decisions are the L7_REQUEST entries. Meanwhile, L4-only endpoints correctly use CONNECT as their sole policy decision.

What Would Need to Change

Core change: Differentiate the log message type based on whether L7 inspection follows:

  • CONNECT — L4-only endpoint. This is the standalone policy decision. No L7 follows.
  • CONNECT_L7 — L7-configured endpoint. This is a tunnel lifecycle event (connection opened), not a policy decision. The L7_REQUEST entries that follow within the tunnel are the actual policy decisions.

Implementation:

  1. Defer log emission: The L7 config query (query_l7_config()) happens after the current CONNECT log at proxy.rs:385-400. Move the log emission to after the L7 config check, or query L7 config earlier, so we know which message type to emit.

  2. Change the message string: When L7 config is present, emit "CONNECT_L7" instead of "CONNECT". All existing fields remain the same.

  3. TUI rendering: Add a CONNECT_L7_FIELD_ORDER to sandbox_logs.rs (or reuse the existing CONNECT_FIELD_ORDER) so the TUI renders these correctly. The TUI could also visually distinguish tunnel lifecycle events from policy decisions.

  4. Secondary: Enrich L7 logs with process identity: The L7EvalContext already carries binary_path, ancestors, and cmdline_paths from L4. Adding these to the L7_REQUEST log fields ensures the policy decision logs are self-contained.

  5. Secondary: Denial aggregator gap: The L7 relay does not emit DenialEvents to the denial aggregator. The proto defines l7_deny and l7_audit stages, and test data references them (mechanistic_mapper.rs:581), but the relay code never sends them. Consider fixing in the same pass.

Desired Behavior

INFO CONNECT_L7   action=allow  dst_host=api.github.com  dst_port=443  policy=github_api  ...
INFO L7_REQUEST   l7_decision=allow  l7_action=GET     l7_target=/repos/org/repo  dst_host=api.github.com  ...
INFO L7_REQUEST   l7_decision=deny   l7_action=DELETE  l7_target=/repos/org/repo  dst_host=api.github.com  ...

For L4-only endpoints, behavior is unchanged:

INFO CONNECT      action=allow  dst_host=example.com  dst_port=443  policy=default  ...

Log consumers can trivially distinguish:

  • CONNECT = standalone L4 policy decision
  • CONNECT_L7 = tunnel lifecycle event (context for L7_REQUEST entries)
  • L7_REQUEST = L7 policy decision (the authoritative decision for this endpoint)

Patterns to Follow

  • Log field naming follows the existing snake_case convention with dst_host, dst_port, policy, etc.
  • The DenialEvent pattern with denial_stage discriminator is the established way to categorize denial types.
  • The TUI *_FIELD_ORDER arrays define display priority — new fields should follow the existing ordering convention.

Proposed Approach

Distinguish L4-only connections from L7-inspected tunnels at the log message level. When the proxy determines that an allowed CONNECT will proceed to L7 inspection, emit "CONNECT_L7" instead of "CONNECT". This preserves the tunnel lifecycle event (no logs are suppressed, no 1:N relationship is broken) while making it clear that CONNECT_L7 is context, not a policy decision. The L7_REQUEST entries remain the authoritative policy decisions for L7 endpoints. As secondary improvements, enrich L7_REQUEST logs with process identity fields and wire up DenialEvent emission from the L7 relay.

Scope Assessment

  • Complexity: Low
  • Confidence: High — minimal code change, clear semantics
  • Estimated files to change: 3-4 (proxy.rs, sandbox_logs.rs, optionally l7/relay.rs and denial_aggregator.rs for secondary improvements)
  • Issue type: refactor

Risks & Open Questions

  • Log consumer migration: Any tooling that filters on message == "CONNECT" will need to also match "CONNECT_L7" if it wants all connections. This is a minor but breaking change to log format.
  • Denial aggregator gap: The L7 relay not emitting DenialEvents is a pre-existing issue. Should this be fixed in the same pass or tracked separately?
  • Process identity in L7 logs (secondary): Adding binary, ancestors, cmdline fields to L7_REQUEST makes each L7 log self-contained but increases log volume per line. Worth doing?

Test Considerations

  • Unit test: L4-only endpoint emits "CONNECT" message
  • Unit test: L7-configured endpoint emits "CONNECT_L7" message
  • Integration test: L7_REQUEST entries still appear correctly after the change
  • TUI test: verify rendering handles both CONNECT and CONNECT_L7 log types
  • If denial aggregator changes are included: verify DenialEvent emission from L7 relay
  • Existing test patterns in crates/openshell-sandbox/src/ should be followed

Created by spike investigation. Use build-from-issue to plan and implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:sandboxSandbox runtime and isolation workarea:supervisorProxy and routing-path workspikestate:agent-readyApproved for agent implementationstate:pr-openedPR has been opened for this issuetopic:l7Application-layer policy and inspection worktopic:observabilityLogging, metrics, and observability work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions