Skip to content

infra(traefik+ops): XFF strip + OPERATOR_ACTIONS 2026-05-12#67

Merged
ahmetabdullahgultekin merged 2 commits into
masterfrom
fix/2026-05-12-infra-hygiene
May 21, 2026
Merged

infra(traefik+ops): XFF strip + OPERATOR_ACTIONS 2026-05-12#67
ahmetabdullahgultekin merged 2 commits into
masterfrom
fix/2026-05-12-infra-hygiene

Conversation

@ahmetabdullahgultekin

Copy link
Copy Markdown
Contributor

Summary

  • Traefik XFF hardening (P1 from infra/security reviews 2026-05-12). forwardedHeaders.trustedIPs: [] on :80 and :443 entryPoints. Traefik is directly internet-facing (no upstream proxy), and RateLimitInterceptor.getClientIP in identity-core-api consumes XFF.split(",")[0] — the prior config let an attacker bypass every per-IP bucket (login, MFA, biometric, qr-generate) by setting their own X-Forwarded-For. Empty trustedIPs causes Traefik to strip incoming XFF and write its own using the peer IP.
  • OPERATOR_ACTIONS_2026-05-12.md. Five items agents shouldn't autonomously execute, each with severity, blast radius, maintenance window, dependencies, and explicit commands.

Files

  • infra/traefik/config/traefik.yml — vendored copy of /opt/projects/infra/traefik/config/traefik.yml with the XFF hardening applied.
  • infra/traefik/config/dynamic.yml — vendored copy (no change; mirrored for parity).
  • infra/traefik/README.md — explains the vendored-vs-live split (live config lives at /opt/projects/infra/traefik/ in the /opt/projects/.git local repo) and the sync workflow.
  • OPERATOR_ACTIONS_2026-05-12.md — the five-item operator checklist.

OPERATOR_ACTIONS items

  1. audit_logs partman bootstrap (HIGH) — V57 ran with success=t but the live pgvector/pgvector:pg17 image lacks pg_partman; the migration's first guard RAISE WARNING + RETURNed before any work. audit_logs.relkind='r', 1168 rows, no inheritance children. Custom image recipe at /opt/projects/infra/RUNBOOK_AUDIT_LOG_PARTMAN.md.
  2. RLS theatre (CRITICAL) — V25 left FORCE ROW LEVEL SECURITY commented out; every policy is OR current_tenant_id() IS NULL fail-open; app connects as postgres superuser; 9 RLS-enabled tables all have relforcerowsecurity=f. Requires non-superuser app role + V62 migration + JDBC URL flip.
  3. web-app/.env.production (HIGH) — Still byte-identical to leaked literal 6bdedd2. Live key has been rotated and the live bundle does NOT include the literal (audited), but rebuild-from-this-tree would regress. Operator chooses between placeholder-on-disk or history rewrite.
  4. Parent main fast-forward (HIGH) — master 220 ahead of main, main 134 ahead but those 134 are already on master via PR chore(merge): reconcile master into main (session 2026-05-11) #51. git push origin master:main --force-with-lease reconciles.
  5. HS512 kid revocation (MEDIUM) — Team Auth-Java is adding revoked-kids: [hs-2026-04] to application-prod.yml. After their PR merges, rebuild api container.

Companion PR

identity-core-api#99 — V61 NOT NULL constraint on audit_logs.tenant_id.

Test plan

  • Traefik config validation. docker compose -f /opt/projects/infra/traefik/docker-compose.yml --env-file /opt/projects/infra/traefik/.env config exits zero on the new traefik.yml.
  • Traefik runtime XFF strip. From an external host, curl -sS -H "X-Forwarded-For: 9.9.9.9" https://api.fivucsas.com/actuator/health and confirm Traefik's /var/log/traefik/access.log records the real peer IP, not 9.9.9.9. Cross-check by tailing the identity-core-api container log for the same request — the clientIp= field should show the peer IP, not 9.9.9.9.
  • Negative path. curl -sS -H "X-Forwarded-For: 9.9.9.9" https://api.fivucsas.com/auth/login 10x rapidly from one host — confirm the rate limit triggers based on peer IP (it should), not the attacker-supplied 9.9.9.9. Without this fix, the 10 requests would each appear to come from a different IP if the attacker varied XFF per request.
  • Restart required. traefik.yml changes are NOT picked up via the file-watcher (watch: true applies to dynamic.yml only). Operator must docker compose ... restart traefik after syncing the vendored copy to /opt/projects/infra/traefik/config/traefik.yml. The README documents this.
  • OPERATOR_ACTIONS review. Tech lead skims each of the five items to confirm severity labels + dependency matrix are accurate before scheduling maintenance windows.

🤖 Generated with Claude Code

P1 hygiene from 2026-05-12 senior reviews (backend, DB, infra, security):

* infra/traefik: vendored copy of /opt/projects/infra/traefik/config/
  with forwardedHeaders.trustedIPs: [] on both :80 and :443 entryPoints.
  RateLimitInterceptor.getClientIP in identity-core-api consumes
  `XFF.split(",")[0]` so the prior config (no forwardedHeaders block)
  let an attacker bypass every per-IP bucket (login, MFA, biometric,
  qr-generate) by setting their own X-Forwarded-For. Empty trustedIPs
  causes Traefik to strip incoming XFF and write its own using the peer
  IP. Internal Docker bridge (172.20.0.0/24) is NOT trusted because
  external clients never connect from that range — only Docker-network
  containers, and those don't set XFF. README.md documents the
  vendored-vs-live split and the sync workflow.

* OPERATOR_ACTIONS_2026-05-12.md: 5 items agents shouldn't autonomously
  execute. Per-item severity, blast radius, maintenance window,
  dependencies, explicit commands:
    1. audit_logs partman bootstrap (V57 was a silent no-op; runbook
       at infra/RUNBOOK_AUDIT_LOG_PARTMAN.md prepped Option A image)
    2. RLS theatre (V25 left FORCE commented; 9 tables relforcerowsecurity=f;
       app role is postgres superuser → RLS bypassed)
    3. web-app/.env.production still byte-identical to leaked literal
       6bdedd2; live bundle is clean but rebuild-from-tree would regress
    4. parent main fast-forward: master 220 ahead, main 134 ahead but
       all already merged via PR #51 — `git push origin master:main
       --force-with-lease` reconciles
    5. HS512 kid hs-2026-04 revocation pending Team Auth-Java PR;
       rebuild api container after merge

Companion api PR fix/2026-05-12-infra-hygiene ships V61 NOT NULL for
audit_logs.tenant_id (locks down the V59 backfill).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 12, 2026 17:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

- **CRITICAL** — exposes a live, exploitable security or correctness gap.
- **HIGH** — drift between deployed config and committed config; reviewers
cannot reason about prod from code.
- **MEDIUM** — hygiene + cosmetic; safe to defer but easy to land.
Comment on lines +1 to +5
# OPERATOR ACTIONS — 2026-05-12

Items surfaced by the 2026-05-12 senior reviews (backend, DB, infra, security)
that agents should not autonomously execute. Each is a checklist with explicit
commands, a maintenance-window estimate, and explicit dependencies. Severity
```

**Blast radius.**
A SQL-injection (or a deliberately misuse of `JdbcTemplate.queryForList`)
git merge-base --is-ancestor origin/main origin/master \
&& echo "OK: main is an ancestor of master, fast-forward safe."
# Apply:
git push origin master:main --force-with-lease
Comment thread infra/traefik/README.md
Comment on lines +24 to +34
# 2. Validate (Traefik watches dynamic.yml live; traefik.yml requires restart)
docker compose -f /opt/projects/infra/traefik/docker-compose.yml \
--env-file /opt/projects/infra/traefik/.env config

# 3. Apply
# dynamic.yml changes: zero-restart, picked up via inotify (`watch: true`)
# traefik.yml changes: require container restart
docker compose -f /opt/projects/infra/traefik/docker-compose.yml \
--env-file /opt/projects/infra/traefik/.env restart traefik

# 4. Verify access log writes peer IP, not client-supplied XFF
Comment thread infra/traefik/README.md
--env-file /opt/projects/infra/traefik/.env restart traefik

# 4. Verify access log writes peer IP, not client-supplied XFF
docker logs traefik 2>&1 | tail -20
Comment on lines +199 to +200
is now `API_KEY_SECRET=fcb06b7…` (verified by the 2026-05-12 security
review). However the on-disk template at
@ahmetabdullahgultekin ahmetabdullahgultekin merged commit b605579 into master May 21, 2026
5 checks passed
ahmetabdullahgultekin added a commit that referenced this pull request May 28, 2026
Low-risk doc/config polish for items Copilot flagged on PR #67 (and PR #69
where those files reached master). No behavior change to running services;
the only executable change is a more-robust docs-site healthcheck path.

- archive/.../OPERATOR_ACTIONS_2026-05-12.md:
  - redact partial live secret (API_KEY_SECRET=fcb06b7… → <redacted>)
  - main update: normal fast-forward `git push origin master:main`,
    reserve --force-with-lease for documented recovery only
  - add LOW to the severity legend (items 9-11 use it)
  - make item-count self-reference consistent (states 11; notes five→11 growth)
  - grammar: "a deliberately misuse" → "a deliberate misuse"
- docs-site/html/identity/index.html: fallback copy now says the OpenAPI
  spec is publicly available at /identity/openapi.json (it ships public)
- landing-website/src/index.css: comment now accurately describes the
  locale-aware :lang(en) uppercasing; drop the false belt-and-braces /
  codepoint-forcing claim and the duplicate text-transform line
- docs-site/docker-compose.prod.yml: healthcheck probes /health (the
  dedicated nginx endpoint) instead of /
- infra/traefik/README.md: add a Traefik-config dry-run validate step
  (compose config only validates the Compose file) and note access logs
  go to /var/log/traefik/access.log per accessLog.filePath, not stdout

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants