From 6831c77df1016b5e2eaf9091c51d8ae5046ec5ce Mon Sep 17 00:00:00 2001 From: Carson Davis Date: Fri, 15 May 2026 15:12:45 -0500 Subject: [PATCH 1/4] add initial draft overview --- docs/adr/deployment/overview.md | 121 ++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 docs/adr/deployment/overview.md diff --git a/docs/adr/deployment/overview.md b/docs/adr/deployment/overview.md new file mode 100644 index 000000000..86bb9b691 --- /dev/null +++ b/docs/adr/deployment/overview.md @@ -0,0 +1,121 @@ +# MMGIS split plan + +## What we're doing + +- Split MMGIS into two deployables: a **config admin** (close to today's app) and **static builds** (frozen, read-only frontends). +- Deploy the config admin to AWS — ECS Fargate + RDS + ALB, preserving the current Express / Postgres / sidecar shape. +- From inside the admin, **publish static builds** to AWS — S3 + CloudFront per build, gated by a shared password. +- Back static builds with **shared AWS-hosted services** (TiTiler, STAC, tipg, optionally RDS read endpoints) where baking doesn't scale. +- Decide per feature whether it survives in static: drop, bake at build time, or point at a shared service. +- Keep the admin codebase as close to current as possible; refactor only what the split forces. + +## Fixed assumptions + +- One config admin instance, many static builds. +- Static-build access = single shared password (CloudFront Function basic auth by default). +- Bake-vs-shared decided per data feature; default toward baking, shared service when bake doesn't scale. +- Same row set across all sections; numbers are stable identifiers. + +## Legend + +One table per section. Each row = one feature with a deployment decision and AWS option. Open questions are referenced as `Q#` and listed at the bottom. + +- **Admin** / **Static**: presence in each deployment. `yes` / `no` / `open` / `N/A`. +- **AWS**: brief option(s). First option = recommended default. +- **Notes**: `Q#` references open questions; otherwise `—`. + +## Frontend capabilities (in browser bundle) + +| # | Feature | Admin | Static | AWS | Notes | +| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | +| 1 | Map viewports (2D Leaflet/deck.gl, 3D Cesium, image/model viewer)| yes | yes | bundle on S3+CloudFront | — | +| 2 | Pure-client tools (Animation, Sites, Kinds, Legend, Layers, Info)| yes | yes | bundle | — | +| 3 | DEM-reading tools (Measure, Curtain, Viewshed, Shade) | yes | yes | bundle + DEM tiles from S3 (#22) | — | +| 4 | Heavy-compute tools (Isochrone) | yes | open | bundle; or backend compute via Lambda | Q1 | +| 5 | Data-querying tools (Identifier, Chemistry) | yes | open | bundle; data per #20/#21 | Q2 | +| 6 | Drawing tool | yes | open | admin: ECS+RDS; static: drop / read-only / local-only | Q3 | +| 7 | Real-time collaboration (WebSocket: Draw sync, presence) | yes | no | ALB WebSocket on admin ECS task | — | +| 8 | Time control (temporal windowing + UI) | yes | yes | bundle; needs time-aware data baked or shared | — | +| 9 | URL state (shareable links) | yes | yes | bundle | — | +| 10 | mmgisAPI (window.mmgisAPI surface) | yes | yes | bundle; some methods no-op in static | — | +| 11 | Plugin tools (`*Plugin-Tools*` build-time inclusion) | yes | yes | bundle (built by same codegen in both pipelines) | — | +| 12 | Landing page / mission picker | yes | open | bundle; picker moot if 1 mission per static deploy | Q4 | +| 13 | Search UI (autocomplete + lookup widget) | yes | open | bundle; fate tied to #27 | Q5 | + +## Data sources + +| # | Feature | Admin | Static | AWS | Notes | +| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | +| 14 | Mission configuration (JSON: layers, tools, view, CRS) | yes (RDS) | yes (baked JSON) | admin: RDS row served by Express; static: S3 JSON fetched at boot | — | +| 15 | Pre-tiled raster imagery (tile pyramids) | yes | yes | S3+CloudFront | — | +| 16 | Dynamic raster tile rendering (TiTiler against COGs) | yes | open | shared TiTiler on Fargate; or pre-bake tile pyramid | Q6 | +| 17 | STAC catalog (`stac-fastapi-pgstac`) | yes | open | shared Fargate + Aurora (pgstac); or baked STAC JSON | Q6 | +| 18 | STAC-driven mosaics (TiTiler-pgSTAC) | yes | open | shared Fargate, same Aurora as #17 | Q6 | +| 19 | Vector tiles from PostGIS (tipg) | yes | open | shared tipg + RDS; or baked MVT in S3 | Q6 | +| 20 | Tabular datasets (CSV/JSON, query by column) | yes (RDS) | open | static: S3 JSON; or shared RDS read endpoint | Q7 | +| 21 | Spatial vector datasets (PostGIS → GeoJSON or MVT) | yes (RDS) | open | static: S3 GeoJSON/MVT; or shared PostGIS + tipg | Q7 | +| 22 | DEM tiles (RGBA-encoded elevation) | yes | yes | S3+CloudFront | — | +| 23 | Feature-attached media (images/models/PDFs) | yes | yes | S3+CloudFront | — | +| 24 | Velocity grid data (wind/current layers) | yes | open | shared veloserver Fargate (#41); or omit | Q8 | + +## Server-only capabilities + +| # | Feature | Admin | Static | AWS | Notes | +| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | +| 25 | Configure admin SPA (mission/layer/dataset/user CRUD UI) | yes | no | same ECS task as Express | — | +| 26 | Mission-asset serving (path-traversal middleware + `sharp` time-compositing) | yes | no | Express in ECS | time-compositing has no static replacement | +| 27 | Server-side search (geodataset search backing #13) | yes | open | admin: Express+RDS; static: client index, shared service, or omit | Q5 | +| 28 | Auth (local accounts, bcrypt, `MMGISSession` sessions) | yes | shared password | admin: RDS-backed sessions; static: CloudFront Function basic auth | — | +| 29 | Long-term API tokens (Bearer tokens) | yes | no | RDS on admin | — | +| 30 | SSO integration (CSSO header-based) | open | no | only if NASA-internal deployment requires | Q9 | +| 31 | Permissions (`111`/`110`/`001`, first-user-becomes-admin) | yes | no | RDS on admin | — | +| 32 | File uploads (Busboy ingestion) | yes | no | direct browser-to-S3 via presigned POST | — | +| 33 | Webhooks (admin-defined HTTP callbacks) | yes | no | RDS + outbound HTTP from ECS | — | +| 34 | Link shortener (`(short, full, creator)` redirects) | open | no | RDS on admin; or drop entirely | Q10 | +| 35 | Adjacent-services proxy (`/stac`, `/titiler`, etc.) | yes | no | ALB target groups per service; static frontend hits shared URLs | — | +| 36 | Custom adjacent-server registry (`ADJACENT_SERVER_CUSTOM_`) | yes | no | env-driven on admin ECS | — | +| 37 | Pug-rendered shells (login, error, SPA HTML) | yes | no | Express in ECS | static ships plain HTML | +| 38 | Swagger UI / OpenAPI (`/api/docs`) | yes | no | Express in ECS | — | +| 39 | Healthcheck endpoint (`/api/utils/healthcheck`) | yes | no | Express; used by ALB target health | — | +| 40 | Jekyll docs site (`/docs`) | yes | open | S3+CloudFront subpath, or separate bucket | Q11 | +| 41 | veloserver sidecar (velocity-grid Python service) | yes | open | shared Fargate; fate tied to #24 | Q8 | + +## Persistence + +| # | Feature | Admin | Static | AWS | Notes | +| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | +| 42 | Main MMGIS DB (Postgres + PostGIS) | yes | no | RDS Postgres; or Aurora Serverless v2 | — | +| 43 | STAC DB `mmgis-stac` (Postgres + pgstac) | open | open | shared Aurora cluster with #42; or separate | Q6, Q12 | + +## Build / ops + +| # | Feature | Admin | Static | AWS | Notes | +| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | +| 44 | Plugin-drop codegen (`updateTools`/`updateComponents`) | yes | yes | runs in CodeBuild / GH Actions for both pipelines | — | +| 45 | Auxiliary GDAL toolbox (offline data prep) | yes | yes | workstation or one-shot CodeBuild job; output → S3 | — | +| 46 | DB init / migrations (`init-db.js`) | yes | no | one-shot ECS task before service starts | — | +| 47 | Logging / observability (Winston) | yes | yes | admin: CloudWatch Logs; static: CloudFront standard logs to S3 | — | + +## Open questions + +- **Q1 (#4):** Is Isochrone viable pure-client (web workers, small areas), or does it need a backend even in admin? If backend, Lambda or shared compute service. +- **Q2 (#5):** Identifier/Chemistry static fate hinges on #20/#21 — if data is baked, these work; if shared, they call shared service. +- **Q3 (#6):** Drawing in static — drop entirely, read-only display of baked features, or invest in a "local browser-storage only" editing mode? +- **Q4 (#12):** Does any static deployment ever host multiple frozen missions, or is it always one-mission-per-deploy? +- **Q5 (#13/#27):** Server-side search in static — replace with client-side index over baked data, point at a shared search service, or omit search entirely? +- **Q6 (#16–#19, #43):** For each tile/feature service, bake threshold vs shared service. Likely needs a per-mission default + override mechanism. +- **Q7 (#20/#21):** Tabular and spatial dataset thresholds: at what size do we stop baking JSON and start hitting a shared RDS read endpoint? +- **Q8 (#24/#41):** Is velocity data actually used by current missions? If not, drop the whole row + sidecar. +- **Q9 (#30):** Does the config admin ever deploy in an environment where NASA CSSO is required? +- **Q10 (#34):** Is the link shortener used? If not, drop everywhere. +- **Q11 (#40):** Does the static deployment need to ship the `/docs` site, or does docs live only on the admin? +- **Q12 (#43):** One shared Aurora cluster for both #42 and #43, or separate RDS instances? Cost vs blast radius. + +## Open questions for AWS architecture (not per-feature) + +- One Aurora cluster shared across admin + shared services, or separate? +- VPC layout: shared VPC for admin + shared services, or separate? +- Per-build CloudFront distribution vs one distribution with path-based routing — isolation vs cost. +- Do shared services (TiTiler, STAC, tipg, veloserver) need their own auth layer (API keys, IAM-signed) beyond "shared password gates the static frontend"? +- Secrets: Secrets Manager vs SSM Parameter Store. +- CI: GitHub Actions vs CodePipeline + CodeBuild. From 9485574bb999ae99afecbb209986a12d788c7184 Mon Sep 17 00:00:00 2001 From: Carson Davis Date: Tue, 19 May 2026 23:33:11 -0500 Subject: [PATCH 2/4] add draft deployment adr --- docs/adr/deployment/adr.md | 483 +++++++++++++++++++++++++++++++++++++ 1 file changed, 483 insertions(+) create mode 100644 docs/adr/deployment/adr.md diff --git a/docs/adr/deployment/adr.md b/docs/adr/deployment/adr.md new file mode 100644 index 000000000..f4aeca6ed --- /dev/null +++ b/docs/adr/deployment/adr.md @@ -0,0 +1,483 @@ +# ADR: AWS deployment with admin/dashboard split + +**Status:** Proposed — Under Review + +**Date:** 2026-05-19 + +## 1. Intent + +Today MMGIS runs as one Docker-compose stack: a single server process serves the admin tool, the main map app, and proxies the optional Python sidecars; one Postgres holds users, sessions, mission configs, datasets, geodatasets, and drawings. + +We want a **dual deployment**: one AWS-hosted **admin stack** (multi-user, authenticated, full-feature — close to today's app) from which an admin can **publish many independent, read-only dashboards** to S3 + CloudFront. The admin stack is the source of truth; dashboards are frozen artifacts that point back at shared AWS services when their data is too big to bake. + +## 2. Initial Guidelines + +These drive every decision downstream. If any item is challenged, downstream sections need re-discussion. + +1. **One admin instance, many dashboard deployments.** +2. **Dashboards are S3 + CloudFront.** No per-dashboard compute. +3. **Preserve MMGIS features by default.** A feature drops only when it genuinely cannot work in its target deployable, and the drop is called out with a reason. +4. **Shared infrastructure beats per-dashboard infrastructure** unless a hard requirement says otherwise. One Postgres serving dashboard-scoped tables, not one Postgres per dashboard. One sidecar deployment serving many dashboards, not one set per dashboard. +5. **Adjacent services deploy as part of the admin stack** TiTiler, STAC, tipg, veloserver, etc are reachable by dashboards over the network. +6. **Admin auth mirrors today's MMGIS** — multi-user accounts, Postgres-backed sessions, the existing permission codes, optional CSSO. **Dashboard auth is one shared password** checked at the edge, with per-dashboard passwords as a nice-to-have. + +## 3. Implementation overview + +Five moving parts. Each is sketched here; sections below carry the detail. + +**Admin stack (one deployment)** + +- Containerized service running today's MMGIS application image: one Node process serving the admin tool and main map app +- The four Python sidecars (TiTiler, STAC, tipg, veloserver) as sibling services in the same cluster. +- One managed Postgres holding everything Postgres holds today (users, sessions, configs, datasets, geodatasets, drawings, STAC catalog). +- One load balancer terminating TLS and routing today's same-origin paths (`/api`, `/configure`, `/stac`, `/titiler`, etc). +- S3 replacing the local `Missions/` directory for mission assets. + +**Dashboard builds (one per published dashboard)** + +- One S3 bucket holding the JS bundle and the baked mission config in JSON. +- One CloudFront distribution in front of it. +- One CloudFront Function as a shared-password gate. +- One DNS record pointing the dashboard's subdomain at the distribution. +- **No backend, no database, no sidecar** — individual dashboards will call the shared sidecars hosted in the admin stack + +**Code refactor** + +Five seams in the frontend code; most of the codebase isn't touched. + +- **Freezing the mission configuration into the bundle.** The frontend currently boots by asking the server for its mission configuration. A dashboard has no server to ask, so the configuration must already be inside the bundle. MMGIS already runs a small pre-bundle script that writes out generated JavaScript files (today it lists the installed tools and components); we add one more generated file — the frozen mission configuration — that the frontend imports like any other source. + +- **Replacing the frontend's calls to the server.** Every named call the frontend makes to MMGIS's backend flows through one dispatcher function. That dispatcher already has an unused if-branch for "what if there's no server?" — wired but never triggered today because a flag is hard-coded to server-mode. Dashboard mode flips the flag and fills the branch with a per-call lookup table: + - *Bake.* Answer known at build time, written into the bundle. Just return it. + - *Reroute.* Call one of the shared Python services directly instead. + - *Compute.* Answer in the browser using baked-in data. + - *Drop.* This call doesn't make sense in a dashboard (e.g. drawing-write, login). Return an error gracefully. + + Because every call goes through one dispatcher, this is one function and one table — not a sweeping edit. + +- **Telling the frontend where the Python services live.** The frontend currently builds URLs to the Python services as same-origin paths like `/titiler/...`, relying on MMGIS's server to forward them behind the scenes. A dashboard has no server to forward through; it needs the services' real public addresses. A small helper returns the right URL base for the build mode — same-origin paths in admin mode (no behavior change), absolute URLs in dashboard mode. Only four places in the frontend build such URLs. + +- **Handling backend-only computations.** MMGIS's backend has a few small utility endpoints that do work for the frontend (elevation profiles, projection conversions, image-band metadata). A dashboard has no backend, so each one is handled individually: drop the feature, redirect to a Python service, or move the math into the browser. Per-feature product decisions, not a mechanical rewrite. + +- **Disabling server-dependent features.** Two features have nowhere to go in a dashboard and just turn off: the login form (no accounts) and the live-update WebSocket (nothing to connect to). + +- **Two features need a real design decision, not a quiet drop.** Saving drawn shapes (no database to save to) and server-side search (no Postgres to query) both have plausible preservation paths: bake-and-display-only mode, local-storage editing, a baked search index, routing through a shared sidecar, or a small shared endpoint in the admin stack. Tracked as Q-DRAW and Q-SEARCH. + + +**Adjacent services (one deployment, shared by everyone)** + +The four Python services (TiTiler, STAC, tipg, veloserver) run as sibling containers in the same cluster as the admin, using today's docker-compose images. The services themselves don't change. Two consumers reach them differently: + +- **The admin** reaches them as today: the browser asks for `/titiler/...` on the admin's domain; the admin server forwards behind the scenes. No code change. +- **Dashboards** reach them directly by absolute URL. That's a cross-origin request from the dashboard's domain to the sidecar's, so each sidecar needs a CORS allowlist for dashboard origins. + +**Provisioning flow (new code in the admin)** + +When an admin clicks **Publish** in the admin tool, the admin's backend kicks off a separate task that: + +1. Reads the mission's current configuration from the database. +2. Builds the dashboard's frontend bundle with the configuration frozen in. +3. Provisions the dashboard's AWS resources: an S3 bucket, a CloudFront distribution, the password-gate function, and the DNS entry pointing the dashboard's subdomain at the distribution. +4. Uploads the bundle to S3 and tells CloudFront to refresh its cache. +5. Returns the new dashboard's URL to the admin tool, which displays it and records it in a dashboards registry. + +A matching **Delete Dashboard** path reverses every step. + +*Implementation: see `detailed-implementation-plan.md` for the full phase breakdown.* + +## 4. Admin stack + +Today's server composes the main app, the admin tool, the sidecar proxy, and the WebSocket server into one process. We keep that shape and put it in a containerized service. Adjacent services run as their own sibling services in the same cluster. + +### 4.1 Compute + +**Decision:** Which container platform runs the admin stack? + +**Options:** + +- *Full ECS Fargate.* One load balancer, multiple target groups, path-based listener rules; native WebSocket support; full control over networking, health checks, and deployment. +- *ECS Express Mode.* AWS's newer "simpler ECS"; provisions an ALB and auto-scaling with one API call — but provisions one ALB per service, while we need one ALB with path-based rules routing to the admin task and every sidecar. +- *AWS App Runner.* Closed to new customers in 2026; AWS redirects new users to ECS Express Mode. +- *Self-managed EC2 with Docker.* Extra infra to manage compared to fargate, but potentially cheaper and saves some baking refactor. + +**Recommended:** Full ECS Fargate. + +**Why:** The one-ALB-many-services routing pattern is load-bearing and rules out Express Mode. + +Sidecars run as their own services in the same cluster, with private service-discovery DNS the admin task resolves. + +**Decision:** How does the browser reach the sidecars? + +Today the browser never talks to the Python sidecars directly. It hits `/titiler/...` or `/stac/...` on MMGIS's own domain; the Express server forwards (proxies) the request to the right Python service behind the scenes. The browser sees one website. The question for AWS is whether to keep that proxy shape or let the load balancer route to the sidecars directly. + +**Options:** + +- *Server proxy preserved (today's shape).* The load balancer sends every request to the admin container; the admin forwards sidecar requests to the Python containers. Zero code change. Single domain survives, so cookies follow and the frontend's hardcoded paths still work. The existing admin-write gate — public GETs allowed, admin login required for writes — keeps doing real security work. Cost: one extra hop per sidecar request, a few ms inside an AWS region. +- *Load balancer routes directly.* The load balancer recognizes sidecar paths and sends them straight to the Python containers. Lower latency, independent health checks per sidecar. But the load balancer routes by URL only — it doesn't know who's calling — so the admin write gate is gone. That matters: the admin tool actually issues write calls to STAC (creating, updating, deleting catalog items). Restoring the gate means a Lambda authorizer, service-side basic auth, or a hybrid that proxies only the writes — new code in every case. + +**Recommended:** Server proxy preserved. + +**Why:** Zero code change, the admin gate keeps working, and the extra hop is cheap compared to the actual sidecar work. + +### 4.2 Database + +**Decision:** Host both databases (the main MMGIS database and the `mmgis-stac` catalog) on one Postgres instance, or split them across two? + +**Options:** + +- *One instance, two logical databases.* Mirrors today's docker-compose. Cheaper, simpler to operate. +- *Two instances.* Independent scaling and a smaller blast radius if the STAC workload misbehaves. More operational surface. + +**Recommended:** One instance. + +**Why:** They coexist fine today on one Postgres; no signal STAC will outgrow that. Easy to split later if it does. + +**Open:** Q-DB-1. + +Sessions stay Postgres-backed (no code change). + +### 4.3 Networking and TLS + +CloudFront is AWS's CDN — it caches static assets at edge locations close to users and gives you a place to attach WAF rules or request-level logic. The dashboards already live behind their own CloudFront distributions. The question is whether the admin should sit behind one too. + +**Decision:** CloudFront in front of the admin load balancer? + +**Options:** + +- *Add CloudFront.* CDN-cached static assets, single domain shape, optional WAF integration. Cache rules need to whitelist the API and WebSocket paths so they bypass the cache. +- *Skip CloudFront.* Admin hits the load balancer directly. Fewer resources. + +**Recommended:** Add CloudFront. + +**Why:** Small cost; gives the admin and dashboards a consistent shape (everything fronted by CloudFront). + +### 4.4 Mission asset storage + +Mission assets move from the admin's local disk to S3 (covered in §3). The remaining decision is how *uploads* get there. + +A "presigned URL" is an S3 feature where the server hands the browser a temporary URL that includes a signature granting upload permission for one specific object. The browser then PUTs the file straight to S3 — the server is involved only in handing out the URL, not in moving bytes. + +**Decision:** How do file uploads land in S3? + +**Options:** + +- *Presigned upload, direct browser-to-S3.* The admin server hands back a presigned URL; the browser uploads to S3 directly. +- *Through-server upload.* The admin server receives the bytes and writes them to S3. Pins all upload bandwidth to the admin service; risks timeouts on multi-GB files. + +**Recommended:** Presigned upload. + +**Why:** Lifts upload bandwidth off the admin service and removes timeout risk. + +**Open:** Q-UPLOAD-CEILING (how large should browser uploads be allowed to be?) + +### 4.5 Authentication + +The auth model doesn't change: local accounts, hashed passwords, Postgres-backed sessions, the existing first-user-becomes-superadmin gate, the three `AUTH` modes (`local`, `off`, `csso`). The one real concern is the bootstrap window. + +A fresh admin deploy with no users has an exposed first-signup endpoint that silently grants superadmin to whoever hits it first — no rate limit, no IP allowlist, no token gating. On the public internet that's a race the legitimate admin can lose. We have to close that window somehow. + +**Decision:** How do we close the first-user-becomes-superadmin gap? + +**Options:** + +- *Block public ingress until the first user is created.* Manual runbook step; deploy with a tight security-group rule, log in, create the superadmin, then open ingress. +- *Seed a superadmin via the init task.* The init task that already creates the database also creates a superadmin from credentials in a secret, removing the gap entirely. +- *Gate the endpoint behind a config flag.* `ALLOW_FIRST_SIGNUP=true` has to be set explicitly, defaulting to off. Operator flips it on for the first signup, then off. + +**Recommended:** Seed a superadmin via the init task. + +**Why:** Removes the gap rather than relying on the operator to remember a runbook step. The credentials live in a secret manager either way. + +**Open:** Q-DEPLOY-1. + +*Implementation: see `detailed-implementation-plan.md` Phases A and J.* + +## 5. Dashboard stack + +A dashboard is "the main map app with the admin removed and the mission config frozen." + +### 5.1 Per-dashboard resources + +- **One S3 bucket.** The JS bundle, the baked mission config, and any per-dashboard baked data (small GeoJSON, small CSV, etc.). +- **One CloudFront distribution** in front of the bucket. Default behavior: serve the SPA shell for unknown paths. Static assets cache aggressively; the baked config is fingerprinted and immutable. +- **One CloudFront Function** as the password gate, attached to the viewer-request event. Browser basic auth, checked at the CDN edge. +- **One DNS record** pointing the chosen subdomain at the distribution. + +No backend, no database, no sidecar — only shared services from the admin stack. + +### 5.2 What dashboards read at runtime + +- **Their own baked mission config** for the mission (this replaces the boot-time mission-config fetch). +- **The shared sidecars** in the admin stack for any layer whose data was too large to bake. +- **The shared Postgres**, indirectly via tipg or other read-path services — never directly. Dashboards do not have database credentials. +- **Their own baked data files** for any data that fits the bake threshold. + +### 5.3 Per-feature drop list + +Features that **drop in dashboard mode**, with reasons: + +- **Drawing tool writes** — no Postgres, no WebSocket. *Could* be partially preserved as read-only display of baked features or local-browser-storage editing. *Open: Q-DRAW.* +- **All three WebSocket consumers** — real-time Draw collaboration, layer-update notifications from the admin tool to open map sessions, and admin-tool-to-admin-tool multi-admin coordination. All three drop in dashboards; the admin stack keeps all three. +- **The admin tool** — by design, no admin in dashboards. +- **Long-term API tokens, accounts, permissions, webhooks, link shortener** — no backend. +- **File uploads** — read-only. +- **Sidecar proxy** — dashboards talk to the shared services directly. +- **Backend-only utility routes** (elevation profile, band metadata, projection conversion, server-side dataset search, link expansion) — each needs a per-feature disposition (drop, call a sidecar directly, or replace with a baked computation). These are *backend route disappearances*, not the same shape as the frontend URL helper. + +Features that **survive in dashboard mode**: + +- Map viewports (2D, 3D, image/model/PDF viewer). +- Pure-client tools: Animation, Sites, Kinds, Legend, Layers, Info. +- DEM-reading tools: Measure, Curtain, Viewshed, Shade — they consume DEM tiles, which bake fine to S3. +- Time control, URL state, the embed API, plugin components. + +Features whose dashboard fate is **conditional**: see `features.md` and the open-questions list. + +### 5.4 Authentication + +The gate itself is a CloudFront Function — a tiny piece of JavaScript that runs at the CDN edge before any request reaches S3, checks an `Authorization` header against a known password, and returns 401 if it doesn't match. The browser handles the password prompt as standard basic auth. What's left to decide is whether all dashboards share one password or each gets its own. + +**Decision:** One shared password across all dashboards, or per-dashboard passwords? + +**Options:** + +- *Single shared password.* One value baked into every dashboard's Function. Trivial to manage; one secret to rotate. But revoking access to a single dashboard means rotating the password for *all* dashboards. +- *Per-dashboard password.* Each distribution's Function is configured with its own password. Per-dashboard revocation is cheap. Comes essentially free since we provision a Function per dashboard anyway — the only cost is one more secret per dashboard to track. + +**Recommended:** Per-dashboard password. + +**Why:** Independent revocation is the operational property that matters as soon as you publish more than a handful of dashboards. The added management cost is low because the Function is already per-dashboard. + +**Open:** Q-AUTH-1. + +*Implementation: see `detailed-implementation-plan.md` Phases D, E, and J.* + +## 6. Code refactor decisions + +The conceptual plan for the refactor is in §3. This section captures the architectural decisions inside that plan that aren't yet settled. + +### 6.1 Stubbing the API-call dispatcher + +When dashboard mode fills the dispatcher's dormant non-server branch (the mechanism in §3), it can do so in two shapes. + +**Decision:** Stub the single dispatcher with a per-call lookup table, or branch each call site individually? + +**Options:** + +- *Stub the dispatcher.* One function gets a per-call disposition table (bake / reroute / compute / drop). Every call site keeps calling `api('whatever')` unchanged. One place to edit; one place to break. +- *Branch each call site.* At each place the frontend calls the dispatcher, wrap the call in `if (dashboardMode)` and handle the case there. More invasive (many call sites); per-site behavior is more explicit. + +**Recommended:** Stub the dispatcher. + +**Why:** Concentrates the dashboard-mode logic in one place, matches the existing chokepoint shape, and leaves every call site unchanged. + +**Open:** Q-CALLS-API. + +### 6.2 Time-compositing layers in dashboards + +Some mission configs use a URL convention that triggers server-side compositing of time-windowed map tiles — the server reads several tiles at different timestamps, blends them, and returns one tile. A dashboard has no server to do that compositing, and the compositing step isn't free. + +**Decision:** What happens to time-composited layers in dashboards? + +**Options:** + +- *Pre-bake every time slice at publish time.* The publish step composites every possible time window in advance and stores the results as static tiles in S3. Works, but storage cost scales with how many time windows the layer supports. +- *Hide the layer in the dashboard.* The layer simply doesn't appear in dashboards that don't pre-bake it. Cheapest; loses the feature for that layer. + +**Recommended:** Per-layer decision rather than a global default. + +**Why:** Some layers are critical to the mission and worth the bake cost; others are decorative and can be hidden. Marking the disposition per layer in the mission config is cheaper than picking one global rule. + +**Open:** Q-TIME. + +### 6.3 Cross-origin sidecar auth gate + +In today's stack, the admin server's sidecar proxy wraps each Python service in an admin-write gate — anonymous reads pass, writes require admin login. Dashboards reach the sidecars cross-origin, bypassing that proxy. The gate has to come from somewhere. + +**Decision:** How do we gate dashboard access to the shared sidecars? + +**Options:** + +- *Password gate alone.* Only authorized users load the dashboard; once loaded, sidecar requests are unauthenticated but reachable. Simple, but assumes nothing else on the internet stumbles onto the sidecar URLs. +- *CORS allow-list only.* Restricts in-browser access to dashboard and admin origins. Does not stop direct `curl`. +- *Signed requests.* CloudFront signs requests to the sidecars (Lambda@Edge or a similar mechanism). More work; properly secures the services against any direct access. + +**Recommended:** CORS allow-list plus the password gate. + +**Why:** Defense-in-depth at low cost; the residual risk (a direct unauthenticated `curl` against read-only services) is acceptable until security review demands stronger. + +**Open:** Q-AUTH-2. + +*Implementation: see `detailed-implementation-plan.md` Phases A through F.* + +## 7. Provisioning flow + +The new code path: an admin clicks **Publish** in the admin tool. What happens: + +1. **Admin tool → admin server.** The publish request, with mission, dashboard name, and settings. +2. **Admin server → bundling task.** Reads the mission's current config from Postgres, builds the dashboard's frontend bundle with the configuration frozen in, and emits a directory. +3. **Admin server → provisioning.** Creates the per-dashboard S3 bucket, CloudFront distribution, password-gate Function, and DNS record. +4. **Admin server → upload + invalidate.** Uploads the bundle to the new bucket and issues a CloudFront invalidation so users see the new build immediately. +5. **Admin server → admin tool.** Returns the dashboard URL; the admin tool surfaces it and records it in the dashboards registry table. + +The bundling task in step 2 is a real compute job — it reads from the database, runs Webpack, and produces a directory tree. Where that work runs is a real choice. + +**Decision:** How does the bundling task run? + +**Options:** + +- *In-process in the admin task.* Simplest; ties up the admin's compute during a build; bundle size bounded by the admin container's filesystem and memory. +- *Spawned ECS task per publish.* A fresh container per build, isolated from the admin. Clean lifecycle, predictable footprint. Cold-start latency (a few seconds to start the task). +- *CodeBuild job triggered by the admin.* AWS-native CI primitive; gives free logging and build artifacts. Adds an external surface to manage. + +**Recommended:** Spawned ECS task per publish. + +**Why:** Clean lifecycle, predictable resource footprint, no contention with the admin's serving load. + +**Decision:** How do we provision the per-dashboard resources? + +**Options:** + +- *CDK or CloudFormation template, deployed from the admin task.* Declarative, idempotent, easy to tear down. Requires a large IAM surface on the admin's role. +- *Direct SDK calls.* Imperative, simpler IAM (scoped to exactly what the calls touch). Teardown is custom code. +- *Step Functions orchestration.* Overengineered for this. Defer. + +**Recommended:** Direct SDK calls from the spawned bundling task. + +**Why:** Tight IAM scope; teardown is straightforward when paired call-for-call with creation. + +**Teardown.** Admin → Delete Dashboard. Reverse of provisioning: invalidate CloudFront, delete distribution, delete Function, delete bucket, remove DNS record, remove registry row. + +*Implementation: see `detailed-implementation-plan.md` Phases H and I.* + +## 8. Shared vs. per-instance + +The defining tension of this design. The default position is **shared** — one resource serving many dashboards — and we deviate only when isolation is a hard requirement. + +### 8.1 Database + +The one-Postgres-vs-many decision is in §4.2 (Q-DB-1, recommendation: one instance). Per *dashboard* there's a separate question — one Postgres per dashboard — which we reject: the operational cost (N instances to patch, monitor, back up) and the security surface (each dashboard now has database credentials) aren't justified for any need we've identified. Tables get a dashboard-scoped slice on the shared instance only when they need persistence beyond a baked file, which is the rare case. + +### 8.2 Adjacent services + +- **One deployment of each sidecar**, shared across the admin and every dashboard. +- **Rejected alternative: per-dashboard sidecars.** Cost (N copies of each Python service running) and management (N deployments to upgrade) are unjustified given the services are stateless or read from shared databases. +- **Veloserver is the exception worth flagging.** Its requirements are under-documented, and no frontend code references it today. So the live question for AWS is narrower than "deploy it or not": *does any production mission config still reference veloserver-backed layers?* If yes, document what the service needs; if no, drop. Tracked as Q-VELO. + +### 8.3 Dashboard registry + +The admin tracks every dashboard it has published — at minimum URL, name, owner, and provisioning metadata — in a registry table on the shared Postgres. Used to list dashboards in the admin UI, gate Delete Dashboard, and know which CloudFront distributions to invalidate on republish. + +*Implementation: see `detailed-implementation-plan.md` Phases G and I.* + +## 9. Data flow + +A recurring question this ADR explicitly does **not** solve, but commits to a default. + +### 9.1 The local-files heritage + +Original MMGIS was a local-machine app: big imagery and elevation files lived on disk under a mission directory, mission configs referenced them with relative paths, and tabular/vector data uploaded through the admin landed in Postgres. The AWS deployed world has no shared local disk, so two things change: + +- **Files that used to be on local disk → S3** under the same prefix layout. The relative-path resolver in mission configs points at the S3 prefix instead of the filesystem. +- **No "point at a local path" workflow survives.** Mission configs may not reference absolute filesystem paths; relative paths under the mission folder remain supported. + +Data uploaded through the admin (datasets, geodatasets) continues to land in Postgres as today, since Postgres is still part of the admin stack. + +### 9.2 Bake-vs-database default + +For data a dashboard needs to read, in priority order: + +1. **Default: bake to JSON / MVT / GeoJSON in S3** and have the dashboard fetch it as a static asset. +2. **Fallback: a shared admin-stack service** (a sidecar for raster, vector, or catalog; a thin custom query endpoint for tabular). +3. **Last resort: a dashboard-scoped table in the shared Postgres** plus the thin query endpoint to read it. + +We do not pre-commit a numeric bake threshold. Per-feature in `features.md` we note which mode each data source uses; the threshold is selected case-by-case until we have enough cases to formalize a rule. + +### 9.3 The open part + +**Open:** Q-BAKE-CEILING — for genuinely large baked artifacts (multi-GB on S3 with streaming parsing), is the static-asset path viable, or does the bake-vs-shared cutover happen much earlier in practice? Investigation, not an ADR-time decision. + +*Implementation: see `detailed-implementation-plan.md` Phase F.* + +## 10. URL topology + +Two real choices interact: how the admin stack exposes its services, and how dashboards reach those services. + +### 10.1 Admin + +All admin paths on one CloudFront distribution in front of the admin load balancer, same shape as today. The sidecar proxy continues to forward under the same paths. + +### 10.2 Dashboards reaching shared services + +Dashboards live on their own domain; the sidecars live in the admin stack. A dashboard needs a URL it can put in fetch calls. Two ways to arrange that. + +**Decision:** How do dashboards reach the sidecars? + +**Options:** + +- *Per-service subdomain.* Each shared service gets its own public URL (e.g. `titiler.`, `stac.`); dashboards hit those URLs directly. CORS configured per service. Several subdomains and TLS certs to manage. +- *One CloudFront fronts everything.* A single CloudFront distribution sits in front of the admin S3 bucket, all dashboard buckets, and all sidecar targets — path-based routing decides which origin serves a given request. Fewer resources; the routing complexity moves into CloudFront's behavior rules. + +**Recommended:** Per-service subdomain. + +**Why:** Lines up with the existing path-prefix discipline — today's `/titiler`, `/stac`, etc. just become subdomains, no routing rewrite needed in CloudFront. + +**Open:** Q-URL-1. + +### 10.3 Per-dashboard CloudFront vs. shared + +A CloudFront distribution is the AWS resource that fronts an origin (an S3 bucket, in our case) with a CDN, TLS, and (for us) the password-gate Function. We can either give each dashboard its own distribution, or run one shared distribution that path-routes to many dashboard buckets. + +**Decision:** One CloudFront per dashboard, or one CloudFront serving many? + +**Options:** + +- *Per-dashboard distribution.* Each dashboard has its own distribution, its own Function (so its own password), and clean isolation. Drawback: N distributions to monitor, and each carries a small per-distribution cost floor. +- *Shared distribution, path-routed per dashboard.* One distribution serves `//...` for many dashboards. Cheaper; harder to give one dashboard its own password; harder to revoke access to a single dashboard. + +**Recommended:** Per-dashboard distribution. + +**Why:** Isolation and per-dashboard password come for free; cost is acceptable until N gets large. + +**Open:** Q-URL-2 (revisit if N grows). + +*Implementation: see `detailed-implementation-plan.md` Phase H.* + +## 11. Open questions (consolidated) + +Questions with a home section in this ADR are pointer entries. Questions tracked only here (mostly feature-level scope decisions) carry their description. + +### Architecture-level (has a home section) + +- **Q-DB-1** — One Postgres instance for both databases, or separate? → §4.2. +- **Q-URL-1** — Per-service subdomain for each sidecar, or one CloudFront fronting everything? → §10.2. +- **Q-URL-2** — Per-dashboard CloudFront distribution, or one distribution with path routing? → §10.3. +- **Q-AUTH-1** — Per-dashboard password, or one shared password? → §5.4. +- **Q-AUTH-2** — Cross-origin sidecar gate: password-only, CORS allow-list, or signed requests? → §6.3. +- **Q-DEPLOY-1** — How do we close the first-user-becomes-superadmin gap? → §4.5. +- **Q-CALLS-API** — Stub the API-call dispatcher, or branch each call site individually? → §6.1. +- **Q-TIME** — Per-layer disposition for time-composited layers in dashboards. → §6.2. +- **Q-VELO** — Is veloserver referenced by any current mission config? → §8.2. + +### Feature-level (tracked in `features.md`) + +- **Q-DRAW** — Drawing in dashboards: drop, read-only display of baked features, or local-storage edit mode? +- **Q-LANDING** — Does any dashboard host multiple frozen missions, or is it strictly one-mission-per-deploy? +- **Q-SEARCH** — Dashboard search: client-side baked index, routed through tipg, or a shared search endpoint in the admin stack? Per-dashboard scoping (one dashboard can't discover another's data) is part of the answer either way. +- **Q-BAKE-CEILING** — How large can a baked artifact realistically get before the static-asset path falls over? +- **Q-UPLOAD-CEILING** — Today's UI uploads are bounded by the server's body-parser limit; presigned S3 lifts the ceiling by orders of magnitude. Should the UI grow into that capacity, or stay modest? +- **Q-SSO** — Does the admin ever deploy where CSSO is mandatory? If not, the CSSO middleware is dead code in AWS. +- **Q-SHORTENER** — Is the link shortener used? If not, drop everywhere. +- **Q-DOCS** — Does the dashboard ever need to ship the docs site, or does it live only on the admin? + +### Implementation-level + +The detailed plan carries these: + +- The exhaustive list of call sites that need rewriting. +- The exact shape of the baked config module and the API-call dispatch table. +- The IAM policy template for the per-publish provisioning task. + +--- + +**Cross-reference:** See `working-plan.md` for the structure and workflow that produced this ADR. See `features.md` for the per-feature inventory. See the personal review checklist for the human-facing review steps. See `detailed-implementation-plan.md` for file/function-level refactor instructions. From 867d098df3278e7d5be36078a3a5f95a937cb795 Mon Sep 17 00:00:00 2001 From: Carson Davis Date: Wed, 20 May 2026 10:03:56 -0500 Subject: [PATCH 3/4] continue clarifying adr points --- docs/adr/deployment/adr.md | 67 ++++++++++++++++++++++++-------------- 1 file changed, 43 insertions(+), 24 deletions(-) diff --git a/docs/adr/deployment/adr.md b/docs/adr/deployment/adr.md index f4aeca6ed..44a17c95b 100644 --- a/docs/adr/deployment/adr.md +++ b/docs/adr/deployment/adr.md @@ -29,9 +29,9 @@ Five moving parts. Each is sketched here; sections below carry the detail. - Containerized service running today's MMGIS application image: one Node process serving the admin tool and main map app - The four Python sidecars (TiTiler, STAC, tipg, veloserver) as sibling services in the same cluster. -- One managed Postgres holding everything Postgres holds today (users, sessions, configs, datasets, geodatasets, drawings, STAC catalog). +- One managed Postgres holding the same data it holds today: user accounts and sessions, mission configs, tabular **datasets** and PostGIS-backed **geodatasets** (both uploaded as rows through the admin API), drawn features, and the STAC catalog. - One load balancer terminating TLS and routing today's same-origin paths (`/api`, `/configure`, `/stac`, `/titiler`, etc). -- S3 replacing the local `Missions/` directory for mission assets. +- S3 replaces the local `Missions/` directory for **raster mission assets** (tile pyramids, DEMs, basemap imagery). AWS containers have no shared local disk, so S3 is the obvious cloud equivalent — nothing about *what* MMGIS stores changes, only where the on-disk part physically lives. **Dashboard builds (one per published dashboard)** @@ -153,7 +153,7 @@ CloudFront is AWS's CDN — it caches static assets at edge locations close to u ### 4.4 Mission asset storage -Mission assets move from the admin's local disk to S3 (covered in §3). The remaining decision is how *uploads* get there. +*Raster* mission assets — the tile pyramids, DEMs, and basemap imagery that lived in the `Missions/` folder — move to S3 (covered in §3). Postgres-backed data (datasets, geodatasets, configs, drawings) stays in Postgres; only the on-disk slice of MMGIS moves. The remaining decision is how uploads get there. A "presigned URL" is an S3 feature where the server hands the browser a temporary URL that includes a signature granting upload permission for one specific object. The browser then PUTs the file straight to S3 — the server is involved only in handing out the URL, not in moving bytes. @@ -168,9 +168,23 @@ A "presigned URL" is an S3 feature where the server hands the browser a temporar **Why:** Lifts upload bandwidth off the admin service and removes timeout risk. -**Open:** Q-UPLOAD-CEILING (how large should browser uploads be allowed to be?) +### 4.5 Big-file upload workflow -### 4.5 Authentication +Today, mission operators handle big raw imagery on their workstation: run a GDAL script, get a **tile pyramid** (a folder of thousands of small tile images), then `scp` the folder into MMGIS's `Missions/` directory. The UI upload path is capped at 500MB and isn't used for the big stuff. + +In AWS there's no shared filesystem to `scp` to, and admin users won't have direct AWS credentials — everything has to go through the admin UI. + +The question: how does a tile pyramid (thousands of files, many GB) get from a workstation into S3 via the admin UI? Presigned uploads handle one big file fine, but a pyramid is many files. + +**Options:** + +- *Upload as a single archive.* Operator zips the pyramid, uploads the archive via presigned, a backend task extracts it back into S3. One operator action; reintroduces a backend step in the upload path. +- *Bulk multi-file upload.* Browser fires off many presigned uploads in parallel. Works for small pyramids; brittle for big ones (browser memory, dropped connections, no resumability). +- *Shift the production format to COGs.* A Cloud-Optimized GeoTIFF is one file containing the whole pyramid; TiTiler (already in our sidecars) serves tiles from it on demand. Operators run `tifs2cogs` (already in `auxiliary/stac/`) instead of `gdal2customtiles`. One file, standard upload. Requires migrating existing tile-pyramid layers in mission configs. + +**Open:** Q-BIG-UPLOAD. Once the workflow is settled, the per-file size cap follows from it and is a deploy-time config value. + +### 4.6 Authentication The auth model doesn't change: local accounts, hashed passwords, Postgres-backed sessions, the existing first-user-becomes-superadmin gate, the three `AUTH` modes (`local`, `off`, `csso`). The one real concern is the bootstrap window. @@ -207,10 +221,14 @@ No backend, no database, no sidecar — only shared services from the admin stac ### 5.2 What dashboards read at runtime -- **Their own baked mission config** for the mission (this replaces the boot-time mission-config fetch). -- **The shared sidecars** in the admin stack for any layer whose data was too large to bake. -- **The shared Postgres**, indirectly via tipg or other read-path services — never directly. Dashboards do not have database credentials. -- **Their own baked data files** for any data that fits the bake threshold. +For each kind of data a dashboard needs: + +- *Mission configuration.* Baked into the JS bundle. No request. +- *Raster tiles, DEMs, basemap imagery.* Fetched from S3 via CloudFront — usually from the admin's shared S3 bucket (the data already lives there from when admins uploaded it; no per-dashboard copy needed). +- *Small per-mission tabular or vector data.* Baked into the dashboard's own S3 bucket at publish time as JSON or GeoJSON, fetched as a static asset. +- *Larger tabular or vector data.* Queried dynamically from a shared sidecar (TiTiler for raster mosaics, tipg for PostGIS vector tiles, a custom endpoint for tabular search). Dashboards never connect to Postgres directly. + +The dashboard doesn't have to figure out where any of this lives at runtime — every URL it needs is already in the baked mission config. As part of the publish step, each layer's URL is rewritten to point wherever its data actually ended up: an absolute URL into admin's S3, a relative URL into the dashboard's own bucket, or an absolute sidecar URL. At runtime, the dashboard just reads each URL out of its config and fetches on demand, the same way today's MMGIS fetches tiles on demand from its local server. The static-vs-dynamic choice only affects *which origin* serves the bytes, not *when* they load. ### 5.3 Per-feature drop list @@ -315,7 +333,7 @@ In today's stack, the admin server's sidecar proxy wraps each Python service in The new code path: an admin clicks **Publish** in the admin tool. What happens: 1. **Admin tool → admin server.** The publish request, with mission, dashboard name, and settings. -2. **Admin server → bundling task.** Reads the mission's current config from Postgres, builds the dashboard's frontend bundle with the configuration frozen in, and emits a directory. +2. **Admin server → bundling task.** Reads the mission's current config from Postgres. For each layer the mission references, decides where the data will live (baked into the dashboard's bucket, left in admin's S3, or served by a sidecar) and rewrites the layer's URL in the baked config accordingly. Builds the dashboard's frontend bundle with the rewritten configuration frozen in. Emits a directory of bundle plus baked static assets. 3. **Admin server → provisioning.** Creates the per-dashboard S3 bucket, CloudFront distribution, password-gate Function, and DNS record. 4. **Admin server → upload + invalidate.** Uploads the bundle to the new bucket and issues a CloudFront invalidation so users see the new build immediately. 5. **Admin server → admin tool.** Returns the dashboard URL; the admin tool surfaces it and records it in the dashboards registry table. @@ -372,30 +390,31 @@ The admin tracks every dashboard it has published — at minimum URL, name, owne ## 9. Data flow -A recurring question this ADR explicitly does **not** solve, but commits to a default. - ### 9.1 The local-files heritage -Original MMGIS was a local-machine app: big imagery and elevation files lived on disk under a mission directory, mission configs referenced them with relative paths, and tabular/vector data uploaded through the admin landed in Postgres. The AWS deployed world has no shared local disk, so two things change: +MMGIS's storage was always split: **raster files on local disk** under the mission directory; **structured data in Postgres** (tabular datasets, PostGIS geodatasets, mission configs, drawings, sessions). The AWS deployed world has no shared local disk, so: -- **Files that used to be on local disk → S3** under the same prefix layout. The relative-path resolver in mission configs points at the S3 prefix instead of the filesystem. +- **Raster files → S3**, same prefix layout. The relative-path resolver in mission configs points at the S3 prefix instead of the filesystem. +- **Structured data → still Postgres**, now on RDS instead of in a container. - **No "point at a local path" workflow survives.** Mission configs may not reference absolute filesystem paths; relative paths under the mission folder remain supported. -Data uploaded through the admin (datasets, geodatasets) continues to land in Postgres as today, since Postgres is still part of the admin stack. +### 9.2 Where dashboard data comes from + +A dashboard pulls data from one of three places. The choice isn't really about *size* — S3 can hold anything — it's about **access pattern** (static fetch vs. dynamic query) and **which bucket** holds it. -### 9.2 Bake-vs-database default +- **Static fetch from the admin's S3 bucket.** No copy needed; the data already lives there from when admins uploaded it. The baked mission config points at the existing CloudFront-fronted URL. Right for raster tiles, DEMs, basemap imagery — the big files that already live in admin S3 and would only duplicate if copied per dashboard. +- **Static fetch from the dashboard's own S3 bucket.** Baked at publish time. The publish step reads from admin storage (Postgres rows or admin S3 files), serializes to JSON or GeoJSON, and writes a static file into the dashboard's bucket alongside the JS bundle. Right for *mission-specific* small data — the mission config itself, small lookup tables, baked search indices. Clean deletion lifecycle: drop the dashboard's bucket and its data is gone with it. +- **Dynamic query against a shared sidecar.** The dashboard makes HTTP requests to TiTiler (raster mosaics over big COGs), tipg (PostGIS as vector tiles or OGC Features), or a thin custom endpoint for tabular search. Right when the access pattern is "compute this on demand," not "fetch this file." -For data a dashboard needs to read, in priority order: +The default position is to push as much as possible into the first two categories (static fetches, no service hop) and use sidecars only for data that genuinely needs dynamic querying. -1. **Default: bake to JSON / MVT / GeoJSON in S3** and have the dashboard fetch it as a static asset. -2. **Fallback: a shared admin-stack service** (a sidecar for raster, vector, or catalog; a thin custom query endpoint for tabular). -3. **Last resort: a dashboard-scoped table in the shared Postgres** plus the thin query endpoint to read it. +**The publish step is therefore a selective data-copying operation.** For each piece of data the mission references, it decides: leave it where it is (admin's S3 or a sidecar) and write the URL into the baked config; or read from admin storage, serialize, and write into the dashboard's bucket. Most missions end up with a mix of all three. -We do not pre-commit a numeric bake threshold. Per-feature in `features.md` we note which mode each data source uses; the threshold is selected case-by-case until we have enough cases to formalize a rule. +**Last resort: a dashboard-scoped table in the shared Postgres** plus a thin query endpoint to read it. Only when the dashboard genuinely needs writeable per-dashboard persistence — rare enough that we don't pre-commit a design. ### 9.3 The open part -**Open:** Q-BAKE-CEILING — for genuinely large baked artifacts (multi-GB on S3 with streaming parsing), is the static-asset path viable, or does the bake-vs-shared cutover happen much earlier in practice? Investigation, not an ADR-time decision. +**Open:** Q-BAKE-CEILING — how much data can a dashboard load at boot before it feels slow? This is the UX ceiling that decides which data lands in the first two categories (static fetch) vs. the third (sidecar query). Investigation needed; not an ADR-time decision. *Implementation: see `detailed-implementation-plan.md` Phase F.* @@ -458,14 +477,14 @@ Questions with a home section in this ADR are pointer entries. Questions tracked - **Q-CALLS-API** — Stub the API-call dispatcher, or branch each call site individually? → §6.1. - **Q-TIME** — Per-layer disposition for time-composited layers in dashboards. → §6.2. - **Q-VELO** — Is veloserver referenced by any current mission config? → §8.2. +- **Q-BIG-UPLOAD** — How do tile pyramids (thousands of files, many GB) reach S3 via the admin UI? → §4.5. ### Feature-level (tracked in `features.md`) - **Q-DRAW** — Drawing in dashboards: drop, read-only display of baked features, or local-storage edit mode? - **Q-LANDING** — Does any dashboard host multiple frozen missions, or is it strictly one-mission-per-deploy? - **Q-SEARCH** — Dashboard search: client-side baked index, routed through tipg, or a shared search endpoint in the admin stack? Per-dashboard scoping (one dashboard can't discover another's data) is part of the answer either way. -- **Q-BAKE-CEILING** — How large can a baked artifact realistically get before the static-asset path falls over? -- **Q-UPLOAD-CEILING** — Today's UI uploads are bounded by the server's body-parser limit; presigned S3 lifts the ceiling by orders of magnitude. Should the UI grow into that capacity, or stay modest? +- **Q-BAKE-CEILING** — How much data can a dashboard reasonably load at boot before it feels slow? This is a bandwidth/UX ceiling on the static-fetch path (S3 can store anything; the question is what's tolerable for a user). The answer sets the line between "bake as a static file" and "route through a sidecar." - **Q-SSO** — Does the admin ever deploy where CSSO is mandatory? If not, the CSSO middleware is dead code in AWS. - **Q-SHORTENER** — Is the link shortener used? If not, drop everywhere. - **Q-DOCS** — Does the dashboard ever need to ship the docs site, or does it live only on the admin? From 71b9ab3384645430eb13925fc1c23f59fdcc9074 Mon Sep 17 00:00:00 2001 From: Carson Davis Date: Thu, 21 May 2026 09:36:57 -0500 Subject: [PATCH 4/4] add draft preserve adrs --- docs/adr/deployment/adr.md | 502 ------- docs/adr/deployment/features.md | 116 ++ docs/adr/deployment/overview.md | 121 -- .../preserve/adr-a-aws-deployment.md | 362 +++++ .../preserve/adr-b-frontend-refactor.md | 115 ++ .../preserve/detailed-implementation-plan.md | 1297 +++++++++++++++++ docs/adr/deployment/preserve/overview.md | 88 ++ 7 files changed, 1978 insertions(+), 623 deletions(-) delete mode 100644 docs/adr/deployment/adr.md create mode 100644 docs/adr/deployment/features.md delete mode 100644 docs/adr/deployment/overview.md create mode 100644 docs/adr/deployment/preserve/adr-a-aws-deployment.md create mode 100644 docs/adr/deployment/preserve/adr-b-frontend-refactor.md create mode 100644 docs/adr/deployment/preserve/detailed-implementation-plan.md create mode 100644 docs/adr/deployment/preserve/overview.md diff --git a/docs/adr/deployment/adr.md b/docs/adr/deployment/adr.md deleted file mode 100644 index 44a17c95b..000000000 --- a/docs/adr/deployment/adr.md +++ /dev/null @@ -1,502 +0,0 @@ -# ADR: AWS deployment with admin/dashboard split - -**Status:** Proposed — Under Review - -**Date:** 2026-05-19 - -## 1. Intent - -Today MMGIS runs as one Docker-compose stack: a single server process serves the admin tool, the main map app, and proxies the optional Python sidecars; one Postgres holds users, sessions, mission configs, datasets, geodatasets, and drawings. - -We want a **dual deployment**: one AWS-hosted **admin stack** (multi-user, authenticated, full-feature — close to today's app) from which an admin can **publish many independent, read-only dashboards** to S3 + CloudFront. The admin stack is the source of truth; dashboards are frozen artifacts that point back at shared AWS services when their data is too big to bake. - -## 2. Initial Guidelines - -These drive every decision downstream. If any item is challenged, downstream sections need re-discussion. - -1. **One admin instance, many dashboard deployments.** -2. **Dashboards are S3 + CloudFront.** No per-dashboard compute. -3. **Preserve MMGIS features by default.** A feature drops only when it genuinely cannot work in its target deployable, and the drop is called out with a reason. -4. **Shared infrastructure beats per-dashboard infrastructure** unless a hard requirement says otherwise. One Postgres serving dashboard-scoped tables, not one Postgres per dashboard. One sidecar deployment serving many dashboards, not one set per dashboard. -5. **Adjacent services deploy as part of the admin stack** TiTiler, STAC, tipg, veloserver, etc are reachable by dashboards over the network. -6. **Admin auth mirrors today's MMGIS** — multi-user accounts, Postgres-backed sessions, the existing permission codes, optional CSSO. **Dashboard auth is one shared password** checked at the edge, with per-dashboard passwords as a nice-to-have. - -## 3. Implementation overview - -Five moving parts. Each is sketched here; sections below carry the detail. - -**Admin stack (one deployment)** - -- Containerized service running today's MMGIS application image: one Node process serving the admin tool and main map app -- The four Python sidecars (TiTiler, STAC, tipg, veloserver) as sibling services in the same cluster. -- One managed Postgres holding the same data it holds today: user accounts and sessions, mission configs, tabular **datasets** and PostGIS-backed **geodatasets** (both uploaded as rows through the admin API), drawn features, and the STAC catalog. -- One load balancer terminating TLS and routing today's same-origin paths (`/api`, `/configure`, `/stac`, `/titiler`, etc). -- S3 replaces the local `Missions/` directory for **raster mission assets** (tile pyramids, DEMs, basemap imagery). AWS containers have no shared local disk, so S3 is the obvious cloud equivalent — nothing about *what* MMGIS stores changes, only where the on-disk part physically lives. - -**Dashboard builds (one per published dashboard)** - -- One S3 bucket holding the JS bundle and the baked mission config in JSON. -- One CloudFront distribution in front of it. -- One CloudFront Function as a shared-password gate. -- One DNS record pointing the dashboard's subdomain at the distribution. -- **No backend, no database, no sidecar** — individual dashboards will call the shared sidecars hosted in the admin stack - -**Code refactor** - -Five seams in the frontend code; most of the codebase isn't touched. - -- **Freezing the mission configuration into the bundle.** The frontend currently boots by asking the server for its mission configuration. A dashboard has no server to ask, so the configuration must already be inside the bundle. MMGIS already runs a small pre-bundle script that writes out generated JavaScript files (today it lists the installed tools and components); we add one more generated file — the frozen mission configuration — that the frontend imports like any other source. - -- **Replacing the frontend's calls to the server.** Every named call the frontend makes to MMGIS's backend flows through one dispatcher function. That dispatcher already has an unused if-branch for "what if there's no server?" — wired but never triggered today because a flag is hard-coded to server-mode. Dashboard mode flips the flag and fills the branch with a per-call lookup table: - - *Bake.* Answer known at build time, written into the bundle. Just return it. - - *Reroute.* Call one of the shared Python services directly instead. - - *Compute.* Answer in the browser using baked-in data. - - *Drop.* This call doesn't make sense in a dashboard (e.g. drawing-write, login). Return an error gracefully. - - Because every call goes through one dispatcher, this is one function and one table — not a sweeping edit. - -- **Telling the frontend where the Python services live.** The frontend currently builds URLs to the Python services as same-origin paths like `/titiler/...`, relying on MMGIS's server to forward them behind the scenes. A dashboard has no server to forward through; it needs the services' real public addresses. A small helper returns the right URL base for the build mode — same-origin paths in admin mode (no behavior change), absolute URLs in dashboard mode. Only four places in the frontend build such URLs. - -- **Handling backend-only computations.** MMGIS's backend has a few small utility endpoints that do work for the frontend (elevation profiles, projection conversions, image-band metadata). A dashboard has no backend, so each one is handled individually: drop the feature, redirect to a Python service, or move the math into the browser. Per-feature product decisions, not a mechanical rewrite. - -- **Disabling server-dependent features.** Two features have nowhere to go in a dashboard and just turn off: the login form (no accounts) and the live-update WebSocket (nothing to connect to). - -- **Two features need a real design decision, not a quiet drop.** Saving drawn shapes (no database to save to) and server-side search (no Postgres to query) both have plausible preservation paths: bake-and-display-only mode, local-storage editing, a baked search index, routing through a shared sidecar, or a small shared endpoint in the admin stack. Tracked as Q-DRAW and Q-SEARCH. - - -**Adjacent services (one deployment, shared by everyone)** - -The four Python services (TiTiler, STAC, tipg, veloserver) run as sibling containers in the same cluster as the admin, using today's docker-compose images. The services themselves don't change. Two consumers reach them differently: - -- **The admin** reaches them as today: the browser asks for `/titiler/...` on the admin's domain; the admin server forwards behind the scenes. No code change. -- **Dashboards** reach them directly by absolute URL. That's a cross-origin request from the dashboard's domain to the sidecar's, so each sidecar needs a CORS allowlist for dashboard origins. - -**Provisioning flow (new code in the admin)** - -When an admin clicks **Publish** in the admin tool, the admin's backend kicks off a separate task that: - -1. Reads the mission's current configuration from the database. -2. Builds the dashboard's frontend bundle with the configuration frozen in. -3. Provisions the dashboard's AWS resources: an S3 bucket, a CloudFront distribution, the password-gate function, and the DNS entry pointing the dashboard's subdomain at the distribution. -4. Uploads the bundle to S3 and tells CloudFront to refresh its cache. -5. Returns the new dashboard's URL to the admin tool, which displays it and records it in a dashboards registry. - -A matching **Delete Dashboard** path reverses every step. - -*Implementation: see `detailed-implementation-plan.md` for the full phase breakdown.* - -## 4. Admin stack - -Today's server composes the main app, the admin tool, the sidecar proxy, and the WebSocket server into one process. We keep that shape and put it in a containerized service. Adjacent services run as their own sibling services in the same cluster. - -### 4.1 Compute - -**Decision:** Which container platform runs the admin stack? - -**Options:** - -- *Full ECS Fargate.* One load balancer, multiple target groups, path-based listener rules; native WebSocket support; full control over networking, health checks, and deployment. -- *ECS Express Mode.* AWS's newer "simpler ECS"; provisions an ALB and auto-scaling with one API call — but provisions one ALB per service, while we need one ALB with path-based rules routing to the admin task and every sidecar. -- *AWS App Runner.* Closed to new customers in 2026; AWS redirects new users to ECS Express Mode. -- *Self-managed EC2 with Docker.* Extra infra to manage compared to fargate, but potentially cheaper and saves some baking refactor. - -**Recommended:** Full ECS Fargate. - -**Why:** The one-ALB-many-services routing pattern is load-bearing and rules out Express Mode. - -Sidecars run as their own services in the same cluster, with private service-discovery DNS the admin task resolves. - -**Decision:** How does the browser reach the sidecars? - -Today the browser never talks to the Python sidecars directly. It hits `/titiler/...` or `/stac/...` on MMGIS's own domain; the Express server forwards (proxies) the request to the right Python service behind the scenes. The browser sees one website. The question for AWS is whether to keep that proxy shape or let the load balancer route to the sidecars directly. - -**Options:** - -- *Server proxy preserved (today's shape).* The load balancer sends every request to the admin container; the admin forwards sidecar requests to the Python containers. Zero code change. Single domain survives, so cookies follow and the frontend's hardcoded paths still work. The existing admin-write gate — public GETs allowed, admin login required for writes — keeps doing real security work. Cost: one extra hop per sidecar request, a few ms inside an AWS region. -- *Load balancer routes directly.* The load balancer recognizes sidecar paths and sends them straight to the Python containers. Lower latency, independent health checks per sidecar. But the load balancer routes by URL only — it doesn't know who's calling — so the admin write gate is gone. That matters: the admin tool actually issues write calls to STAC (creating, updating, deleting catalog items). Restoring the gate means a Lambda authorizer, service-side basic auth, or a hybrid that proxies only the writes — new code in every case. - -**Recommended:** Server proxy preserved. - -**Why:** Zero code change, the admin gate keeps working, and the extra hop is cheap compared to the actual sidecar work. - -### 4.2 Database - -**Decision:** Host both databases (the main MMGIS database and the `mmgis-stac` catalog) on one Postgres instance, or split them across two? - -**Options:** - -- *One instance, two logical databases.* Mirrors today's docker-compose. Cheaper, simpler to operate. -- *Two instances.* Independent scaling and a smaller blast radius if the STAC workload misbehaves. More operational surface. - -**Recommended:** One instance. - -**Why:** They coexist fine today on one Postgres; no signal STAC will outgrow that. Easy to split later if it does. - -**Open:** Q-DB-1. - -Sessions stay Postgres-backed (no code change). - -### 4.3 Networking and TLS - -CloudFront is AWS's CDN — it caches static assets at edge locations close to users and gives you a place to attach WAF rules or request-level logic. The dashboards already live behind their own CloudFront distributions. The question is whether the admin should sit behind one too. - -**Decision:** CloudFront in front of the admin load balancer? - -**Options:** - -- *Add CloudFront.* CDN-cached static assets, single domain shape, optional WAF integration. Cache rules need to whitelist the API and WebSocket paths so they bypass the cache. -- *Skip CloudFront.* Admin hits the load balancer directly. Fewer resources. - -**Recommended:** Add CloudFront. - -**Why:** Small cost; gives the admin and dashboards a consistent shape (everything fronted by CloudFront). - -### 4.4 Mission asset storage - -*Raster* mission assets — the tile pyramids, DEMs, and basemap imagery that lived in the `Missions/` folder — move to S3 (covered in §3). Postgres-backed data (datasets, geodatasets, configs, drawings) stays in Postgres; only the on-disk slice of MMGIS moves. The remaining decision is how uploads get there. - -A "presigned URL" is an S3 feature where the server hands the browser a temporary URL that includes a signature granting upload permission for one specific object. The browser then PUTs the file straight to S3 — the server is involved only in handing out the URL, not in moving bytes. - -**Decision:** How do file uploads land in S3? - -**Options:** - -- *Presigned upload, direct browser-to-S3.* The admin server hands back a presigned URL; the browser uploads to S3 directly. -- *Through-server upload.* The admin server receives the bytes and writes them to S3. Pins all upload bandwidth to the admin service; risks timeouts on multi-GB files. - -**Recommended:** Presigned upload. - -**Why:** Lifts upload bandwidth off the admin service and removes timeout risk. - -### 4.5 Big-file upload workflow - -Today, mission operators handle big raw imagery on their workstation: run a GDAL script, get a **tile pyramid** (a folder of thousands of small tile images), then `scp` the folder into MMGIS's `Missions/` directory. The UI upload path is capped at 500MB and isn't used for the big stuff. - -In AWS there's no shared filesystem to `scp` to, and admin users won't have direct AWS credentials — everything has to go through the admin UI. - -The question: how does a tile pyramid (thousands of files, many GB) get from a workstation into S3 via the admin UI? Presigned uploads handle one big file fine, but a pyramid is many files. - -**Options:** - -- *Upload as a single archive.* Operator zips the pyramid, uploads the archive via presigned, a backend task extracts it back into S3. One operator action; reintroduces a backend step in the upload path. -- *Bulk multi-file upload.* Browser fires off many presigned uploads in parallel. Works for small pyramids; brittle for big ones (browser memory, dropped connections, no resumability). -- *Shift the production format to COGs.* A Cloud-Optimized GeoTIFF is one file containing the whole pyramid; TiTiler (already in our sidecars) serves tiles from it on demand. Operators run `tifs2cogs` (already in `auxiliary/stac/`) instead of `gdal2customtiles`. One file, standard upload. Requires migrating existing tile-pyramid layers in mission configs. - -**Open:** Q-BIG-UPLOAD. Once the workflow is settled, the per-file size cap follows from it and is a deploy-time config value. - -### 4.6 Authentication - -The auth model doesn't change: local accounts, hashed passwords, Postgres-backed sessions, the existing first-user-becomes-superadmin gate, the three `AUTH` modes (`local`, `off`, `csso`). The one real concern is the bootstrap window. - -A fresh admin deploy with no users has an exposed first-signup endpoint that silently grants superadmin to whoever hits it first — no rate limit, no IP allowlist, no token gating. On the public internet that's a race the legitimate admin can lose. We have to close that window somehow. - -**Decision:** How do we close the first-user-becomes-superadmin gap? - -**Options:** - -- *Block public ingress until the first user is created.* Manual runbook step; deploy with a tight security-group rule, log in, create the superadmin, then open ingress. -- *Seed a superadmin via the init task.* The init task that already creates the database also creates a superadmin from credentials in a secret, removing the gap entirely. -- *Gate the endpoint behind a config flag.* `ALLOW_FIRST_SIGNUP=true` has to be set explicitly, defaulting to off. Operator flips it on for the first signup, then off. - -**Recommended:** Seed a superadmin via the init task. - -**Why:** Removes the gap rather than relying on the operator to remember a runbook step. The credentials live in a secret manager either way. - -**Open:** Q-DEPLOY-1. - -*Implementation: see `detailed-implementation-plan.md` Phases A and J.* - -## 5. Dashboard stack - -A dashboard is "the main map app with the admin removed and the mission config frozen." - -### 5.1 Per-dashboard resources - -- **One S3 bucket.** The JS bundle, the baked mission config, and any per-dashboard baked data (small GeoJSON, small CSV, etc.). -- **One CloudFront distribution** in front of the bucket. Default behavior: serve the SPA shell for unknown paths. Static assets cache aggressively; the baked config is fingerprinted and immutable. -- **One CloudFront Function** as the password gate, attached to the viewer-request event. Browser basic auth, checked at the CDN edge. -- **One DNS record** pointing the chosen subdomain at the distribution. - -No backend, no database, no sidecar — only shared services from the admin stack. - -### 5.2 What dashboards read at runtime - -For each kind of data a dashboard needs: - -- *Mission configuration.* Baked into the JS bundle. No request. -- *Raster tiles, DEMs, basemap imagery.* Fetched from S3 via CloudFront — usually from the admin's shared S3 bucket (the data already lives there from when admins uploaded it; no per-dashboard copy needed). -- *Small per-mission tabular or vector data.* Baked into the dashboard's own S3 bucket at publish time as JSON or GeoJSON, fetched as a static asset. -- *Larger tabular or vector data.* Queried dynamically from a shared sidecar (TiTiler for raster mosaics, tipg for PostGIS vector tiles, a custom endpoint for tabular search). Dashboards never connect to Postgres directly. - -The dashboard doesn't have to figure out where any of this lives at runtime — every URL it needs is already in the baked mission config. As part of the publish step, each layer's URL is rewritten to point wherever its data actually ended up: an absolute URL into admin's S3, a relative URL into the dashboard's own bucket, or an absolute sidecar URL. At runtime, the dashboard just reads each URL out of its config and fetches on demand, the same way today's MMGIS fetches tiles on demand from its local server. The static-vs-dynamic choice only affects *which origin* serves the bytes, not *when* they load. - -### 5.3 Per-feature drop list - -Features that **drop in dashboard mode**, with reasons: - -- **Drawing tool writes** — no Postgres, no WebSocket. *Could* be partially preserved as read-only display of baked features or local-browser-storage editing. *Open: Q-DRAW.* -- **All three WebSocket consumers** — real-time Draw collaboration, layer-update notifications from the admin tool to open map sessions, and admin-tool-to-admin-tool multi-admin coordination. All three drop in dashboards; the admin stack keeps all three. -- **The admin tool** — by design, no admin in dashboards. -- **Long-term API tokens, accounts, permissions, webhooks, link shortener** — no backend. -- **File uploads** — read-only. -- **Sidecar proxy** — dashboards talk to the shared services directly. -- **Backend-only utility routes** (elevation profile, band metadata, projection conversion, server-side dataset search, link expansion) — each needs a per-feature disposition (drop, call a sidecar directly, or replace with a baked computation). These are *backend route disappearances*, not the same shape as the frontend URL helper. - -Features that **survive in dashboard mode**: - -- Map viewports (2D, 3D, image/model/PDF viewer). -- Pure-client tools: Animation, Sites, Kinds, Legend, Layers, Info. -- DEM-reading tools: Measure, Curtain, Viewshed, Shade — they consume DEM tiles, which bake fine to S3. -- Time control, URL state, the embed API, plugin components. - -Features whose dashboard fate is **conditional**: see `features.md` and the open-questions list. - -### 5.4 Authentication - -The gate itself is a CloudFront Function — a tiny piece of JavaScript that runs at the CDN edge before any request reaches S3, checks an `Authorization` header against a known password, and returns 401 if it doesn't match. The browser handles the password prompt as standard basic auth. What's left to decide is whether all dashboards share one password or each gets its own. - -**Decision:** One shared password across all dashboards, or per-dashboard passwords? - -**Options:** - -- *Single shared password.* One value baked into every dashboard's Function. Trivial to manage; one secret to rotate. But revoking access to a single dashboard means rotating the password for *all* dashboards. -- *Per-dashboard password.* Each distribution's Function is configured with its own password. Per-dashboard revocation is cheap. Comes essentially free since we provision a Function per dashboard anyway — the only cost is one more secret per dashboard to track. - -**Recommended:** Per-dashboard password. - -**Why:** Independent revocation is the operational property that matters as soon as you publish more than a handful of dashboards. The added management cost is low because the Function is already per-dashboard. - -**Open:** Q-AUTH-1. - -*Implementation: see `detailed-implementation-plan.md` Phases D, E, and J.* - -## 6. Code refactor decisions - -The conceptual plan for the refactor is in §3. This section captures the architectural decisions inside that plan that aren't yet settled. - -### 6.1 Stubbing the API-call dispatcher - -When dashboard mode fills the dispatcher's dormant non-server branch (the mechanism in §3), it can do so in two shapes. - -**Decision:** Stub the single dispatcher with a per-call lookup table, or branch each call site individually? - -**Options:** - -- *Stub the dispatcher.* One function gets a per-call disposition table (bake / reroute / compute / drop). Every call site keeps calling `api('whatever')` unchanged. One place to edit; one place to break. -- *Branch each call site.* At each place the frontend calls the dispatcher, wrap the call in `if (dashboardMode)` and handle the case there. More invasive (many call sites); per-site behavior is more explicit. - -**Recommended:** Stub the dispatcher. - -**Why:** Concentrates the dashboard-mode logic in one place, matches the existing chokepoint shape, and leaves every call site unchanged. - -**Open:** Q-CALLS-API. - -### 6.2 Time-compositing layers in dashboards - -Some mission configs use a URL convention that triggers server-side compositing of time-windowed map tiles — the server reads several tiles at different timestamps, blends them, and returns one tile. A dashboard has no server to do that compositing, and the compositing step isn't free. - -**Decision:** What happens to time-composited layers in dashboards? - -**Options:** - -- *Pre-bake every time slice at publish time.* The publish step composites every possible time window in advance and stores the results as static tiles in S3. Works, but storage cost scales with how many time windows the layer supports. -- *Hide the layer in the dashboard.* The layer simply doesn't appear in dashboards that don't pre-bake it. Cheapest; loses the feature for that layer. - -**Recommended:** Per-layer decision rather than a global default. - -**Why:** Some layers are critical to the mission and worth the bake cost; others are decorative and can be hidden. Marking the disposition per layer in the mission config is cheaper than picking one global rule. - -**Open:** Q-TIME. - -### 6.3 Cross-origin sidecar auth gate - -In today's stack, the admin server's sidecar proxy wraps each Python service in an admin-write gate — anonymous reads pass, writes require admin login. Dashboards reach the sidecars cross-origin, bypassing that proxy. The gate has to come from somewhere. - -**Decision:** How do we gate dashboard access to the shared sidecars? - -**Options:** - -- *Password gate alone.* Only authorized users load the dashboard; once loaded, sidecar requests are unauthenticated but reachable. Simple, but assumes nothing else on the internet stumbles onto the sidecar URLs. -- *CORS allow-list only.* Restricts in-browser access to dashboard and admin origins. Does not stop direct `curl`. -- *Signed requests.* CloudFront signs requests to the sidecars (Lambda@Edge or a similar mechanism). More work; properly secures the services against any direct access. - -**Recommended:** CORS allow-list plus the password gate. - -**Why:** Defense-in-depth at low cost; the residual risk (a direct unauthenticated `curl` against read-only services) is acceptable until security review demands stronger. - -**Open:** Q-AUTH-2. - -*Implementation: see `detailed-implementation-plan.md` Phases A through F.* - -## 7. Provisioning flow - -The new code path: an admin clicks **Publish** in the admin tool. What happens: - -1. **Admin tool → admin server.** The publish request, with mission, dashboard name, and settings. -2. **Admin server → bundling task.** Reads the mission's current config from Postgres. For each layer the mission references, decides where the data will live (baked into the dashboard's bucket, left in admin's S3, or served by a sidecar) and rewrites the layer's URL in the baked config accordingly. Builds the dashboard's frontend bundle with the rewritten configuration frozen in. Emits a directory of bundle plus baked static assets. -3. **Admin server → provisioning.** Creates the per-dashboard S3 bucket, CloudFront distribution, password-gate Function, and DNS record. -4. **Admin server → upload + invalidate.** Uploads the bundle to the new bucket and issues a CloudFront invalidation so users see the new build immediately. -5. **Admin server → admin tool.** Returns the dashboard URL; the admin tool surfaces it and records it in the dashboards registry table. - -The bundling task in step 2 is a real compute job — it reads from the database, runs Webpack, and produces a directory tree. Where that work runs is a real choice. - -**Decision:** How does the bundling task run? - -**Options:** - -- *In-process in the admin task.* Simplest; ties up the admin's compute during a build; bundle size bounded by the admin container's filesystem and memory. -- *Spawned ECS task per publish.* A fresh container per build, isolated from the admin. Clean lifecycle, predictable footprint. Cold-start latency (a few seconds to start the task). -- *CodeBuild job triggered by the admin.* AWS-native CI primitive; gives free logging and build artifacts. Adds an external surface to manage. - -**Recommended:** Spawned ECS task per publish. - -**Why:** Clean lifecycle, predictable resource footprint, no contention with the admin's serving load. - -**Decision:** How do we provision the per-dashboard resources? - -**Options:** - -- *CDK or CloudFormation template, deployed from the admin task.* Declarative, idempotent, easy to tear down. Requires a large IAM surface on the admin's role. -- *Direct SDK calls.* Imperative, simpler IAM (scoped to exactly what the calls touch). Teardown is custom code. -- *Step Functions orchestration.* Overengineered for this. Defer. - -**Recommended:** Direct SDK calls from the spawned bundling task. - -**Why:** Tight IAM scope; teardown is straightforward when paired call-for-call with creation. - -**Teardown.** Admin → Delete Dashboard. Reverse of provisioning: invalidate CloudFront, delete distribution, delete Function, delete bucket, remove DNS record, remove registry row. - -*Implementation: see `detailed-implementation-plan.md` Phases H and I.* - -## 8. Shared vs. per-instance - -The defining tension of this design. The default position is **shared** — one resource serving many dashboards — and we deviate only when isolation is a hard requirement. - -### 8.1 Database - -The one-Postgres-vs-many decision is in §4.2 (Q-DB-1, recommendation: one instance). Per *dashboard* there's a separate question — one Postgres per dashboard — which we reject: the operational cost (N instances to patch, monitor, back up) and the security surface (each dashboard now has database credentials) aren't justified for any need we've identified. Tables get a dashboard-scoped slice on the shared instance only when they need persistence beyond a baked file, which is the rare case. - -### 8.2 Adjacent services - -- **One deployment of each sidecar**, shared across the admin and every dashboard. -- **Rejected alternative: per-dashboard sidecars.** Cost (N copies of each Python service running) and management (N deployments to upgrade) are unjustified given the services are stateless or read from shared databases. -- **Veloserver is the exception worth flagging.** Its requirements are under-documented, and no frontend code references it today. So the live question for AWS is narrower than "deploy it or not": *does any production mission config still reference veloserver-backed layers?* If yes, document what the service needs; if no, drop. Tracked as Q-VELO. - -### 8.3 Dashboard registry - -The admin tracks every dashboard it has published — at minimum URL, name, owner, and provisioning metadata — in a registry table on the shared Postgres. Used to list dashboards in the admin UI, gate Delete Dashboard, and know which CloudFront distributions to invalidate on republish. - -*Implementation: see `detailed-implementation-plan.md` Phases G and I.* - -## 9. Data flow - -### 9.1 The local-files heritage - -MMGIS's storage was always split: **raster files on local disk** under the mission directory; **structured data in Postgres** (tabular datasets, PostGIS geodatasets, mission configs, drawings, sessions). The AWS deployed world has no shared local disk, so: - -- **Raster files → S3**, same prefix layout. The relative-path resolver in mission configs points at the S3 prefix instead of the filesystem. -- **Structured data → still Postgres**, now on RDS instead of in a container. -- **No "point at a local path" workflow survives.** Mission configs may not reference absolute filesystem paths; relative paths under the mission folder remain supported. - -### 9.2 Where dashboard data comes from - -A dashboard pulls data from one of three places. The choice isn't really about *size* — S3 can hold anything — it's about **access pattern** (static fetch vs. dynamic query) and **which bucket** holds it. - -- **Static fetch from the admin's S3 bucket.** No copy needed; the data already lives there from when admins uploaded it. The baked mission config points at the existing CloudFront-fronted URL. Right for raster tiles, DEMs, basemap imagery — the big files that already live in admin S3 and would only duplicate if copied per dashboard. -- **Static fetch from the dashboard's own S3 bucket.** Baked at publish time. The publish step reads from admin storage (Postgres rows or admin S3 files), serializes to JSON or GeoJSON, and writes a static file into the dashboard's bucket alongside the JS bundle. Right for *mission-specific* small data — the mission config itself, small lookup tables, baked search indices. Clean deletion lifecycle: drop the dashboard's bucket and its data is gone with it. -- **Dynamic query against a shared sidecar.** The dashboard makes HTTP requests to TiTiler (raster mosaics over big COGs), tipg (PostGIS as vector tiles or OGC Features), or a thin custom endpoint for tabular search. Right when the access pattern is "compute this on demand," not "fetch this file." - -The default position is to push as much as possible into the first two categories (static fetches, no service hop) and use sidecars only for data that genuinely needs dynamic querying. - -**The publish step is therefore a selective data-copying operation.** For each piece of data the mission references, it decides: leave it where it is (admin's S3 or a sidecar) and write the URL into the baked config; or read from admin storage, serialize, and write into the dashboard's bucket. Most missions end up with a mix of all three. - -**Last resort: a dashboard-scoped table in the shared Postgres** plus a thin query endpoint to read it. Only when the dashboard genuinely needs writeable per-dashboard persistence — rare enough that we don't pre-commit a design. - -### 9.3 The open part - -**Open:** Q-BAKE-CEILING — how much data can a dashboard load at boot before it feels slow? This is the UX ceiling that decides which data lands in the first two categories (static fetch) vs. the third (sidecar query). Investigation needed; not an ADR-time decision. - -*Implementation: see `detailed-implementation-plan.md` Phase F.* - -## 10. URL topology - -Two real choices interact: how the admin stack exposes its services, and how dashboards reach those services. - -### 10.1 Admin - -All admin paths on one CloudFront distribution in front of the admin load balancer, same shape as today. The sidecar proxy continues to forward under the same paths. - -### 10.2 Dashboards reaching shared services - -Dashboards live on their own domain; the sidecars live in the admin stack. A dashboard needs a URL it can put in fetch calls. Two ways to arrange that. - -**Decision:** How do dashboards reach the sidecars? - -**Options:** - -- *Per-service subdomain.* Each shared service gets its own public URL (e.g. `titiler.`, `stac.`); dashboards hit those URLs directly. CORS configured per service. Several subdomains and TLS certs to manage. -- *One CloudFront fronts everything.* A single CloudFront distribution sits in front of the admin S3 bucket, all dashboard buckets, and all sidecar targets — path-based routing decides which origin serves a given request. Fewer resources; the routing complexity moves into CloudFront's behavior rules. - -**Recommended:** Per-service subdomain. - -**Why:** Lines up with the existing path-prefix discipline — today's `/titiler`, `/stac`, etc. just become subdomains, no routing rewrite needed in CloudFront. - -**Open:** Q-URL-1. - -### 10.3 Per-dashboard CloudFront vs. shared - -A CloudFront distribution is the AWS resource that fronts an origin (an S3 bucket, in our case) with a CDN, TLS, and (for us) the password-gate Function. We can either give each dashboard its own distribution, or run one shared distribution that path-routes to many dashboard buckets. - -**Decision:** One CloudFront per dashboard, or one CloudFront serving many? - -**Options:** - -- *Per-dashboard distribution.* Each dashboard has its own distribution, its own Function (so its own password), and clean isolation. Drawback: N distributions to monitor, and each carries a small per-distribution cost floor. -- *Shared distribution, path-routed per dashboard.* One distribution serves `//...` for many dashboards. Cheaper; harder to give one dashboard its own password; harder to revoke access to a single dashboard. - -**Recommended:** Per-dashboard distribution. - -**Why:** Isolation and per-dashboard password come for free; cost is acceptable until N gets large. - -**Open:** Q-URL-2 (revisit if N grows). - -*Implementation: see `detailed-implementation-plan.md` Phase H.* - -## 11. Open questions (consolidated) - -Questions with a home section in this ADR are pointer entries. Questions tracked only here (mostly feature-level scope decisions) carry their description. - -### Architecture-level (has a home section) - -- **Q-DB-1** — One Postgres instance for both databases, or separate? → §4.2. -- **Q-URL-1** — Per-service subdomain for each sidecar, or one CloudFront fronting everything? → §10.2. -- **Q-URL-2** — Per-dashboard CloudFront distribution, or one distribution with path routing? → §10.3. -- **Q-AUTH-1** — Per-dashboard password, or one shared password? → §5.4. -- **Q-AUTH-2** — Cross-origin sidecar gate: password-only, CORS allow-list, or signed requests? → §6.3. -- **Q-DEPLOY-1** — How do we close the first-user-becomes-superadmin gap? → §4.5. -- **Q-CALLS-API** — Stub the API-call dispatcher, or branch each call site individually? → §6.1. -- **Q-TIME** — Per-layer disposition for time-composited layers in dashboards. → §6.2. -- **Q-VELO** — Is veloserver referenced by any current mission config? → §8.2. -- **Q-BIG-UPLOAD** — How do tile pyramids (thousands of files, many GB) reach S3 via the admin UI? → §4.5. - -### Feature-level (tracked in `features.md`) - -- **Q-DRAW** — Drawing in dashboards: drop, read-only display of baked features, or local-storage edit mode? -- **Q-LANDING** — Does any dashboard host multiple frozen missions, or is it strictly one-mission-per-deploy? -- **Q-SEARCH** — Dashboard search: client-side baked index, routed through tipg, or a shared search endpoint in the admin stack? Per-dashboard scoping (one dashboard can't discover another's data) is part of the answer either way. -- **Q-BAKE-CEILING** — How much data can a dashboard reasonably load at boot before it feels slow? This is a bandwidth/UX ceiling on the static-fetch path (S3 can store anything; the question is what's tolerable for a user). The answer sets the line between "bake as a static file" and "route through a sidecar." -- **Q-SSO** — Does the admin ever deploy where CSSO is mandatory? If not, the CSSO middleware is dead code in AWS. -- **Q-SHORTENER** — Is the link shortener used? If not, drop everywhere. -- **Q-DOCS** — Does the dashboard ever need to ship the docs site, or does it live only on the admin? - -### Implementation-level - -The detailed plan carries these: - -- The exhaustive list of call sites that need rewriting. -- The exact shape of the baked config module and the API-call dispatch table. -- The IAM policy template for the per-publish provisioning task. - ---- - -**Cross-reference:** See `working-plan.md` for the structure and workflow that produced this ADR. See `features.md` for the per-feature inventory. See the personal review checklist for the human-facing review steps. See `detailed-implementation-plan.md` for file/function-level refactor instructions. diff --git a/docs/adr/deployment/features.md b/docs/adr/deployment/features.md new file mode 100644 index 000000000..966a4c051 --- /dev/null +++ b/docs/adr/deployment/features.md @@ -0,0 +1,116 @@ +# MMGIS deployment-relevant features + +Capability-grouped inventory of MMGIS features whose deployment story is decided independently. Each row carries description, runtime dependencies, presence in each deployable (admin / dashboard), and the AWS implementation strategy. Row numbers are stable identifiers — cite as `#NN`. + +**Columns:** + +- **Description** — what the feature does. +- **Depends on** — what the feature currently needs at runtime. +- **Admin** — presence in the admin stack: `yes` / `no` / `open` / `N/A`. +- **Dashboard** — presence in dashboards: `yes` / `no` / `open` / `N/A`. `open` means the disposition is gated by an open question tracked in the ADRs. +- **AWS** — implementation strategy. First option is the recommended default; alternatives follow. + +Open questions affecting these dispositions live in **ADR-A §8** (AWS-infra-scope) and **ADR-B §5** (frontend-scope). Cross-cutting infra questions not yet owned by an ADR are in `overview-new.md`. + +## Frontend capabilities (in browser bundle) + +| # | Feature | Description | Depends on | Admin | Dashboard | AWS | +| --- | --- | --- | --- | --- | --- | --- | +| 1 | Map viewports | 2D map (Leaflet + partial deck.gl), 3D globe (Cesium), image/model/PDF viewer | Mission config, layer data | yes | yes | Bundle on S3+CloudFront | +| 2 | Pure-client tools | Animation, Sites, Kinds, Legend, Layers, Info — operate on already-loaded data | None beyond loaded layers | yes | yes | Bundle | +| 3 | DEM-reading tools | Measure, Curtain, Viewshed, Shade — read elevation tiles client-side | DEM tiles | yes | yes | Bundle + DEM tiles from S3 (see #22) | +| 4 | Heavy-compute tools | Isochrone — travel-time polygons over DEM pixels; today uses server help | DEM tiles, server compute | yes | open | Bundle; backend compute via Lambda or shared service if pure-client isn't viable | +| 5 | Data-querying tools | Identifier, Chemistry — fetch features/values on demand | Geodataset / dataset query | yes | open | Bundle; data path per #20/#21 | +| 6 | Drawing tool | Interactive create/edit/history/publish of user features | Postgres (read+write), WebSocket | yes | open | Admin: ECS+Postgres; dashboard: drop, read-only display of baked features, or local-browser-storage editing | +| 7 | Real-time collaboration | WebSocket broadcast: Draw sync, layer-update notifications (Configure → Essence), Configure multi-admin coordination | WebSocket | yes | no | ALB WebSocket on admin ECS task | +| 8 | Time control | Temporal layer windowing + time bar UI | Time-aware data sources | yes | yes | Bundle; needs time-aware data baked or shared | +| 9 | URL state | Shareable `?mapLat=…&on=…&tools=…` links | None | yes | yes | Bundle | +| 10 | mmgisAPI | Public embed/plugin surface on `window.mmgisAPI` | None (delegates to other features) | yes | yes | Bundle; some methods no-op in dashboard | +| 11 | Plugin tools | `*Plugin-Tools*` / `*Private-Tools*` build-time inclusion | Build-time only | yes | yes | Bundle (same codegen in both pipelines) | +| 12 | Landing page | Pre-boot mission selection UI | Mission list (from config) | yes | no | Code ships in bundle but never renders in dashboards (one-mission-per-deploy) | +| 13 | Search UI | Autocomplete + geodataset lookup widget | Server-side search (#27) | yes | open | Bundle; dashboard fate tied to #27 | + +## Data sources (frontend reads these) + +| # | Feature | Description | Depends on | Admin | Dashboard | AWS | +| --- | --- | --- | --- | --- | --- | --- | +| 14 | Mission configuration | JSON blob describing layers, tools, view, CRS | Postgres (admin) / baked JSON (dashboard) | yes (Postgres) | yes (baked JSON) | Admin: Postgres row served by Express; dashboard: baked JSON in S3, fetched at boot | +| 15 | Pre-tiled raster imagery | On-disk tile pyramids under `Missions//Layers/` | File system or S3 | yes | yes | S3+CloudFront | +| 16 | Dynamic raster tile rendering | TiTiler against COGs | TiTiler service + COG storage | yes | open | Shared TiTiler in admin cluster; or pre-bake tile pyramid | +| 17 | STAC catalog | `stac-fastapi-pgstac` browse/search | STAC service + STAC Postgres | yes | open | Shared STAC in admin cluster; or baked STAC JSON | +| 18 | STAC-driven mosaics | TiTiler-pgSTAC dynamic mosaicking | TiTiler-pgSTAC + STAC Postgres | yes | open | Shared TiTiler-pgSTAC in admin cluster, same Postgres as #17 | +| 19 | Vector tiles from PostGIS | tipg | tipg service + Postgres | yes | open | Shared tipg + Postgres; or baked MVT in S3 | +| 20 | Tabular datasets | Datasets module (CSV/JSON tables, query by column) | Postgres | yes (Postgres) | open | Dashboard: baked S3 JSON; or shared admin Postgres read endpoint | +| 21 | Spatial vector datasets | Geodatasets module (PostGIS tables → GeoJSON or MVT) | Postgres (PostGIS) | yes (Postgres) | open | Dashboard: baked S3 GeoJSON/MVT; or shared PostGIS + tipg | +| 22 | DEM tiles | RGBA-encoded elevation tiles | File system or S3 | yes | yes | S3+CloudFront | +| 23 | Feature-attached media | Images/models/PDFs referenced by features | File system or S3 | yes | yes | S3+CloudFront | +| 24 | Velocity grid data | Wind/current/velocity layers — **no current frontend code constructs `/veloserver` URLs**; verify per-mission before provisioning | veloserver service (#41) | open | open | Shared veloserver in admin cluster; or omit (fate tied to #41) | + +## Server-only capabilities + +| # | Feature | Description | Depends on | Admin | Dashboard | AWS | +| --- | --- | --- | --- | --- | --- | --- | +| 25 | Configure admin SPA | Mission/layer/dataset/user CRUD UI at `/configure` | Express + Postgres | yes | no | Same ECS task as Express | +| 26 | Mission-asset serving | Path-traversal-hardened static middleware for `/Missions/...` **plus** `_time_` URL convention that composites time-windowed tiles via `sharp` at request time | Express + `sharp` + file system | yes | no | Express in ECS; time-compositing has no dashboard replacement (per-layer pre-bake decision) | +| 27 | Server-side search | Backend search across geodatasets (called by #13) | Express + Postgres (PostGIS) | yes | open | Admin: Express+Postgres; dashboard: client-side index, shared endpoint, or omit | +| 28 | Auth | Local accounts, bcrypt, `MMGISSession` cookie sessions in Postgres | Express + Postgres | yes | shared password | Admin: Postgres-backed sessions; dashboard: CloudFront Function basic auth | +| 29 | Long-term API tokens | Bearer tokens for programmatic access | Postgres | yes | no | Postgres on admin | +| 30 | SSO integration | CSSO header-based identity (off by default) | Upstream proxy headers | open | no | Only if deployment requires; otherwise dormant | +| 31 | Permissions | Active set: `111`/`110`/`001`/`000` (guest); ENUM reserves all 8 values. First-user-becomes-superadmin via `first_signup` | Postgres (`users.permission`) | yes | no | Postgres on admin | +| 32 | File uploads | Busboy ingestion for datasets, geodatasets, mission assets | Express + file system | yes | no | Presigned browser-to-S3 (Busboy still serves small payloads through Express) | +| 33 | Webhooks | Admin-defined HTTP callbacks fired on Draw/Config changes | Postgres + outbound HTTP | yes | no | Postgres + outbound HTTP from admin ECS | +| 34 | Link shortener | `(short, full, creator)` redirects | Postgres | open | no | Postgres on admin; or drop entirely | +| 35 | Adjacent-services proxy | Reverse proxy for `/stac`, `/tipg`, `/titiler`, `/titilerpgstac`, `/veloserver` with admin gating | Express + http-proxy-middleware | yes | no | ALB target groups per service; dashboard frontend hits shared URLs directly | +| 36 | Custom adjacent-server registry | `ADJACENT_SERVER_CUSTOM_` env-driven proxy slots | Env vars + Express proxy | yes | no | Env-driven on admin ECS | +| 37 | Pug-rendered shells | Login page, admin login, error page, SPA HTML | Express + Pug | yes | no | Express in ECS; dashboard ships plain static HTML | +| 38 | Swagger UI / OpenAPI | API docs surface at `/api/docs` | Express | yes | no | Express in ECS | +| 39 | Healthcheck endpoint | `/api/utils/healthcheck` (shallow — no DB check) — used by Playwright and ALB target health | Express | yes | N/A | Express in ECS; consumed by ALB target health | +| 40 | Jekyll docs site | The `/docs` static documentation site (third browser app) | Jekyll build, static file serving | yes | open | Admin: S3+CloudFront subpath; dashboard ship-with-or-not TBD | +| 41 | veloserver sidecar | Python service for velocity/weather grid data, proxied at `/veloserver` | NASA-AMMOS Python service | open | open | Shared sidecar in admin cluster; deployment gated on mission-config audit | +| 42 | Backend utility routes | `/api/utils/getprofile` (Measure elevation profile), `/api/utils/getbands` (Identifier band list), `/api/utils/proj42wkt` (Layers tool projection) — Express helpers called via `calls.api`, **not** sidecar URLs | Express + Python helpers | yes | open | Each call needs a per-feature disposition: drop, redirect to sidecar, or compute client-side | + +## Persistence + +| # | Feature | Description | Depends on | Admin | Dashboard | AWS | +| --- | --- | --- | --- | --- | --- | --- | +| 43 | Main MMGIS database | Postgres 16 + PostGIS — users, sessions, datasets, geodatasets, drawings, configs, tokens | Postgres + PostGIS extension | yes | no | Managed Postgres (engine choice TBD per overview open questions) | +| 44 | STAC database (`mmgis-stac`) | Separate Postgres for STAC + TiTiler-pgSTAC, uses pgstac extension | Postgres + pgstac extension | yes | no | Managed Postgres + pgstac; shared with #43 or separate (per overview open questions) | + +## Build / ops + +| # | Feature | Description | Depends on | Admin | Dashboard | AWS | +| --- | --- | --- | --- | --- | --- | --- | +| 45 | Plugin-drop codegen | `updateTools()` / `updateComponents()` writing `src/pre/*.js` before Webpack | Node script, build time | yes | yes | Runs in GitHub Actions for both pipelines | +| 46 | Auxiliary GDAL toolbox | Offline data-prep (tiles, DEMs, STAC items, legends, ndGeoJSON) | Python + GDAL, workstation | yes | yes | Workstation; or one-shot GitHub Actions job with output → S3 | +| 47 | Playwright test suite | Single runner for unit + e2e | CI; e2e needs ephemeral server | yes | yes | GitHub Actions; ephemeral admin + Postgres for e2e | +| 48 | Docker Compose stack | MMGIS app, Postgres, optional sidecars via `--profile stac` / `--profile veloserver` | Docker | yes (local dev) | yes (local dev) | Local-dev only; production replaces with ECS / managed Postgres | +| 49 | DB init / migrations | `init-db.js` — creates DBs, installs PostGIS / btree_gist / pgstac, creates session table + indexes | Node script, gates server boot | yes | N/A | One-shot ECS task before admin service starts | + +## Cross-cutting + +| # | Feature | Description | Depends on | Admin | Dashboard | AWS | +| --- | --- | --- | --- | --- | --- | --- | +| 50 | Single-origin routing | Express owns `/`, `/api/*`, `/configure`, `/stac`, `/tipg`, `/titiler`, `/veloserver`, `/docs` | Express | yes | N/A (single-origin via CloudFront) | ALB routing on admin; CloudFront on dashboard | +| 51 | CORS / iframe embedding | `FRAME_ANCESTORS` env, embedder reaches in via `iframe.contentWindow.mmgisAPI` | Browser-level + helmet CSP | yes | yes | CloudFront/ALB response headers + helmet CSP | +| 52 | Logging / observability | Winston (pretty in dev, JSON-per-line in prod); password redaction; body/query cropping | stdout / log destination | yes | partial | Admin: CloudWatch Logs; dashboard: CloudFront standard logs to S3 | + +## To be built (new in AWS) + +Net-new surface introduced by the AWS deployment refactor. Doesn't exist in the current codebase; tracked here so the inventory covers both lift-and-shift and additions. + +| # | Feature | Description | Depends on | Admin | Dashboard | AWS | +| --- | --- | --- | --- | --- | --- | --- | +| 53 | Dashboard publishing pipeline | Publish/teardown Express handler + spawned bake-and-provision task + IAM-scoped SDK calls | ECS RunTask + AWS SDK + admin Postgres | yes (net-new) | N/A | Spawned ECS task per publish (ADR-A §5.1) | +| 54 | Per-dashboard runtime resources | S3 bucket + CloudFront distribution + CloudFront Function (password gate) + DNS record, one set per dashboard | AWS S3, CloudFront, Route 53 | N/A | yes (net-new) | One set per dashboard, provisioned at publish time (ADR-A §3.1) | +| 55 | Dashboards admin UI + registry | New `dashboards` table on admin Postgres + a Dashboards page in Configure with async-job status polling | Configure SPA + admin Postgres | yes (net-new) | N/A | New Configure page + new Postgres table (ADR-A §5.3) | + +## Conventions + +- **Rows with `Depends on: None`** are pure client and trivially survive in dashboards. +- **Rows ending in a `#NN` reference** point to another row this row depends on — typically the server-only feature backing a frontend capability. +- **Data-source rows** (#14–24) are decision-heavy: each one's dashboard form is either baked into S3 at publish time, or served by a shared sidecar. The bake-vs-shared threshold is an ADR-A open question. +- **Server-only rows** (#25–42) are mostly "admin yes / dashboard no," with exceptions called out per row. +- **Persistence rows** (#43–44) are shared-resource candidates in AWS — see overview cross-cutting open questions for the shared-vs-separate decision. +- **Build/ops rows** affect CI pipeline design, not runtime topology. +- **Cross-cutting rows** apply to both deployables in some form. +- **To-be-built rows** are the net-new surface introduced by the deployment refactor. diff --git a/docs/adr/deployment/overview.md b/docs/adr/deployment/overview.md deleted file mode 100644 index 86bb9b691..000000000 --- a/docs/adr/deployment/overview.md +++ /dev/null @@ -1,121 +0,0 @@ -# MMGIS split plan - -## What we're doing - -- Split MMGIS into two deployables: a **config admin** (close to today's app) and **static builds** (frozen, read-only frontends). -- Deploy the config admin to AWS — ECS Fargate + RDS + ALB, preserving the current Express / Postgres / sidecar shape. -- From inside the admin, **publish static builds** to AWS — S3 + CloudFront per build, gated by a shared password. -- Back static builds with **shared AWS-hosted services** (TiTiler, STAC, tipg, optionally RDS read endpoints) where baking doesn't scale. -- Decide per feature whether it survives in static: drop, bake at build time, or point at a shared service. -- Keep the admin codebase as close to current as possible; refactor only what the split forces. - -## Fixed assumptions - -- One config admin instance, many static builds. -- Static-build access = single shared password (CloudFront Function basic auth by default). -- Bake-vs-shared decided per data feature; default toward baking, shared service when bake doesn't scale. -- Same row set across all sections; numbers are stable identifiers. - -## Legend - -One table per section. Each row = one feature with a deployment decision and AWS option. Open questions are referenced as `Q#` and listed at the bottom. - -- **Admin** / **Static**: presence in each deployment. `yes` / `no` / `open` / `N/A`. -- **AWS**: brief option(s). First option = recommended default. -- **Notes**: `Q#` references open questions; otherwise `—`. - -## Frontend capabilities (in browser bundle) - -| # | Feature | Admin | Static | AWS | Notes | -| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | -| 1 | Map viewports (2D Leaflet/deck.gl, 3D Cesium, image/model viewer)| yes | yes | bundle on S3+CloudFront | — | -| 2 | Pure-client tools (Animation, Sites, Kinds, Legend, Layers, Info)| yes | yes | bundle | — | -| 3 | DEM-reading tools (Measure, Curtain, Viewshed, Shade) | yes | yes | bundle + DEM tiles from S3 (#22) | — | -| 4 | Heavy-compute tools (Isochrone) | yes | open | bundle; or backend compute via Lambda | Q1 | -| 5 | Data-querying tools (Identifier, Chemistry) | yes | open | bundle; data per #20/#21 | Q2 | -| 6 | Drawing tool | yes | open | admin: ECS+RDS; static: drop / read-only / local-only | Q3 | -| 7 | Real-time collaboration (WebSocket: Draw sync, presence) | yes | no | ALB WebSocket on admin ECS task | — | -| 8 | Time control (temporal windowing + UI) | yes | yes | bundle; needs time-aware data baked or shared | — | -| 9 | URL state (shareable links) | yes | yes | bundle | — | -| 10 | mmgisAPI (window.mmgisAPI surface) | yes | yes | bundle; some methods no-op in static | — | -| 11 | Plugin tools (`*Plugin-Tools*` build-time inclusion) | yes | yes | bundle (built by same codegen in both pipelines) | — | -| 12 | Landing page / mission picker | yes | open | bundle; picker moot if 1 mission per static deploy | Q4 | -| 13 | Search UI (autocomplete + lookup widget) | yes | open | bundle; fate tied to #27 | Q5 | - -## Data sources - -| # | Feature | Admin | Static | AWS | Notes | -| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | -| 14 | Mission configuration (JSON: layers, tools, view, CRS) | yes (RDS) | yes (baked JSON) | admin: RDS row served by Express; static: S3 JSON fetched at boot | — | -| 15 | Pre-tiled raster imagery (tile pyramids) | yes | yes | S3+CloudFront | — | -| 16 | Dynamic raster tile rendering (TiTiler against COGs) | yes | open | shared TiTiler on Fargate; or pre-bake tile pyramid | Q6 | -| 17 | STAC catalog (`stac-fastapi-pgstac`) | yes | open | shared Fargate + Aurora (pgstac); or baked STAC JSON | Q6 | -| 18 | STAC-driven mosaics (TiTiler-pgSTAC) | yes | open | shared Fargate, same Aurora as #17 | Q6 | -| 19 | Vector tiles from PostGIS (tipg) | yes | open | shared tipg + RDS; or baked MVT in S3 | Q6 | -| 20 | Tabular datasets (CSV/JSON, query by column) | yes (RDS) | open | static: S3 JSON; or shared RDS read endpoint | Q7 | -| 21 | Spatial vector datasets (PostGIS → GeoJSON or MVT) | yes (RDS) | open | static: S3 GeoJSON/MVT; or shared PostGIS + tipg | Q7 | -| 22 | DEM tiles (RGBA-encoded elevation) | yes | yes | S3+CloudFront | — | -| 23 | Feature-attached media (images/models/PDFs) | yes | yes | S3+CloudFront | — | -| 24 | Velocity grid data (wind/current layers) | yes | open | shared veloserver Fargate (#41); or omit | Q8 | - -## Server-only capabilities - -| # | Feature | Admin | Static | AWS | Notes | -| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | -| 25 | Configure admin SPA (mission/layer/dataset/user CRUD UI) | yes | no | same ECS task as Express | — | -| 26 | Mission-asset serving (path-traversal middleware + `sharp` time-compositing) | yes | no | Express in ECS | time-compositing has no static replacement | -| 27 | Server-side search (geodataset search backing #13) | yes | open | admin: Express+RDS; static: client index, shared service, or omit | Q5 | -| 28 | Auth (local accounts, bcrypt, `MMGISSession` sessions) | yes | shared password | admin: RDS-backed sessions; static: CloudFront Function basic auth | — | -| 29 | Long-term API tokens (Bearer tokens) | yes | no | RDS on admin | — | -| 30 | SSO integration (CSSO header-based) | open | no | only if NASA-internal deployment requires | Q9 | -| 31 | Permissions (`111`/`110`/`001`, first-user-becomes-admin) | yes | no | RDS on admin | — | -| 32 | File uploads (Busboy ingestion) | yes | no | direct browser-to-S3 via presigned POST | — | -| 33 | Webhooks (admin-defined HTTP callbacks) | yes | no | RDS + outbound HTTP from ECS | — | -| 34 | Link shortener (`(short, full, creator)` redirects) | open | no | RDS on admin; or drop entirely | Q10 | -| 35 | Adjacent-services proxy (`/stac`, `/titiler`, etc.) | yes | no | ALB target groups per service; static frontend hits shared URLs | — | -| 36 | Custom adjacent-server registry (`ADJACENT_SERVER_CUSTOM_`) | yes | no | env-driven on admin ECS | — | -| 37 | Pug-rendered shells (login, error, SPA HTML) | yes | no | Express in ECS | static ships plain HTML | -| 38 | Swagger UI / OpenAPI (`/api/docs`) | yes | no | Express in ECS | — | -| 39 | Healthcheck endpoint (`/api/utils/healthcheck`) | yes | no | Express; used by ALB target health | — | -| 40 | Jekyll docs site (`/docs`) | yes | open | S3+CloudFront subpath, or separate bucket | Q11 | -| 41 | veloserver sidecar (velocity-grid Python service) | yes | open | shared Fargate; fate tied to #24 | Q8 | - -## Persistence - -| # | Feature | Admin | Static | AWS | Notes | -| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | -| 42 | Main MMGIS DB (Postgres + PostGIS) | yes | no | RDS Postgres; or Aurora Serverless v2 | — | -| 43 | STAC DB `mmgis-stac` (Postgres + pgstac) | open | open | shared Aurora cluster with #42; or separate | Q6, Q12 | - -## Build / ops - -| # | Feature | Admin | Static | AWS | Notes | -| --- | ---------------------------------------------------------------- | ----------- | ----------------- | ------------------------------------------------------------------ | ----- | -| 44 | Plugin-drop codegen (`updateTools`/`updateComponents`) | yes | yes | runs in CodeBuild / GH Actions for both pipelines | — | -| 45 | Auxiliary GDAL toolbox (offline data prep) | yes | yes | workstation or one-shot CodeBuild job; output → S3 | — | -| 46 | DB init / migrations (`init-db.js`) | yes | no | one-shot ECS task before service starts | — | -| 47 | Logging / observability (Winston) | yes | yes | admin: CloudWatch Logs; static: CloudFront standard logs to S3 | — | - -## Open questions - -- **Q1 (#4):** Is Isochrone viable pure-client (web workers, small areas), or does it need a backend even in admin? If backend, Lambda or shared compute service. -- **Q2 (#5):** Identifier/Chemistry static fate hinges on #20/#21 — if data is baked, these work; if shared, they call shared service. -- **Q3 (#6):** Drawing in static — drop entirely, read-only display of baked features, or invest in a "local browser-storage only" editing mode? -- **Q4 (#12):** Does any static deployment ever host multiple frozen missions, or is it always one-mission-per-deploy? -- **Q5 (#13/#27):** Server-side search in static — replace with client-side index over baked data, point at a shared search service, or omit search entirely? -- **Q6 (#16–#19, #43):** For each tile/feature service, bake threshold vs shared service. Likely needs a per-mission default + override mechanism. -- **Q7 (#20/#21):** Tabular and spatial dataset thresholds: at what size do we stop baking JSON and start hitting a shared RDS read endpoint? -- **Q8 (#24/#41):** Is velocity data actually used by current missions? If not, drop the whole row + sidecar. -- **Q9 (#30):** Does the config admin ever deploy in an environment where NASA CSSO is required? -- **Q10 (#34):** Is the link shortener used? If not, drop everywhere. -- **Q11 (#40):** Does the static deployment need to ship the `/docs` site, or does docs live only on the admin? -- **Q12 (#43):** One shared Aurora cluster for both #42 and #43, or separate RDS instances? Cost vs blast radius. - -## Open questions for AWS architecture (not per-feature) - -- One Aurora cluster shared across admin + shared services, or separate? -- VPC layout: shared VPC for admin + shared services, or separate? -- Per-build CloudFront distribution vs one distribution with path-based routing — isolation vs cost. -- Do shared services (TiTiler, STAC, tipg, veloserver) need their own auth layer (API keys, IAM-signed) beyond "shared password gates the static frontend"? -- Secrets: Secrets Manager vs SSM Parameter Store. -- CI: GitHub Actions vs CodePipeline + CodeBuild. diff --git a/docs/adr/deployment/preserve/adr-a-aws-deployment.md b/docs/adr/deployment/preserve/adr-a-aws-deployment.md new file mode 100644 index 000000000..5672618eb --- /dev/null +++ b/docs/adr/deployment/preserve/adr-a-aws-deployment.md @@ -0,0 +1,362 @@ +# ADR-A: AWS deployment + +**Status:** Proposed — Under Review +**Date:** 2026-05-19 + +## 1. Scope + +This ADR covers the AWS infrastructure for both deployables introduced in the [overview](./overview-new.md): the admin stack, the dashboard infrastructure, and the cross-cutting concerns that connect them (URL topology, publish flow, shared sidecars, data layout). + +Frontend code changes that support dashboard mode are covered in [ADR-B](./adr-b-frontend-refactor.md). Per-feature drop/survive disposition is in [`features.md`](./features.md). Implementation phasing is in [`detailed-implementation-plan.md`](./detailed-implementation-plan.md). + +The stakeholder-given intent and requirements are in the overview and are treated as constraints here. + +## 2. Admin stack + +The admin stack runs today's MMGIS application image on AWS as a containerized service alongside its data and sidecars. The shape mirrors today's Docker-compose stack: one Node process serving the admin tool, the main map app, and the sidecar proxy; managed Postgres holding the same data it holds today; S3 in place of the local `Missions/` filesystem for raster assets. + +### 2.1 Compute, sidecar routing, and the admin write gate + +Compute platform, how the browser reaches sidecars, and how admin → sidecar writes are gated are one coupled decision. + +The load-bearing question: does the admin's gate on sidecar writes — specifically STAC, the only sidecar with a real write surface (TiTiler and tipg are read-only; veloserver TBD per Q-VELO) — stay in today's server proxy, or get rebuilt at the edge? + +**Three coherent bundles:** + +- **Bundle 1: Today's shape, ported.** Full ECS Fargate. Sidecars run internal-only on the same cluster, reachable only from the admin container via service discovery. Admin's Express server proxies `/titiler`, `/stac`, etc. internally; today's `ensureAdmin` middleware on those routes is the write gate, unchanged. Zero net-new auth code, zero frontend code change. We manage the ALB, listener rules, target groups, canary configs, and scaling policies. + +- **Bundle 2: Express Mode for admin.** ECS Express Mode for the admin task; sidecars still internal-only on service discovery. Server proxy and admin gate preserved (zero auth or frontend code change). AWS manages ALB, canary, and scaling. Cost: Express Mode is six months old (announced November 2025) and locks deployment strategy to canary and load-balancer config to ECS-managed; migrating out later is a real migration, not a config change. + +- **Bundle 3: Subdomain per service.** Each sidecar gets its own subdomain; the ALB routes directly; the server proxy goes away. TiTiler and tipg become directly reachable (read-only anyway). STAC writes need a net-new gate — Lambda@Edge JWT authorizer, hybrid proxy for writes only (reads direct, writes via admin), or service-side auth on STAC. The frontend's inline sidecar URL builders (see [ADR-B §2.3](./adr-b-frontend-refactor.md#23-telling-the-frontend-where-the-sidecars-live)) get centralized into a helper and pointed at subdomains. + +Self-managed EC2 stays viable as a sidetrack to all three bundles (writable local disk would let `Missions/` stay on disk instead of moving to S3) but trades that against AMI / host-patching ops; we don't pursue it here. + + +**Open:** Q-COMPUTE. + +### 2.2 Database + +**Decision:** One Postgres instance, two logical databases (the main MMGIS database and `mmgis-stac`). + +This mirrors today's `docker-compose.db.yml` — they already coexist on one Postgres with no signal STAC will outgrow it. A two-instance split is straightforward to add later if the workload diverges; doing it now is operational overhead without a payoff. + +Sessions stay Postgres-backed (no code change). The dashboards registry (§5.3) is a new table on the same instance. + +### 2.3 Networking and TLS + +CloudFront is AWS's CDN — it caches static assets at edge locations and gives a place to attach WAF rules or request-level logic. The dashboards already sit behind CloudFront. The question is whether the admin should too. + +**Decision:** CloudFront in front of the admin load balancer? + +**Options:** + +- *Add CloudFront.* CDN-cached static assets, single domain shape, optional WAF integration. Cache rules need to whitelist API and WebSocket paths so they bypass the cache. +- *Skip CloudFront.* Admin hits the load balancer directly. Fewer resources. + +**Recommended:** Add CloudFront. + +**Why:** Small cost; gives the admin and dashboards a consistent shape (everything fronted by CloudFront). + +### 2.4 Mission asset storage and uploads + +Raster mission assets — the tile pyramids, DEMs, and basemap imagery that lived in the `Missions/` folder — move to S3. Postgres-backed data (datasets, geodatasets, configs, drawings) stays in Postgres. There are two upload paths to settle: the existing admin UI flow, and the workstation workflow that today bypasses the UI entirely. + +A "presigned URL" is an S3 feature where the server hands the browser a temporary URL granting upload permission for one specific object. The browser then PUTs the file straight to S3 — the server is involved only in handing out the URL. + +#### (a) UI upload path + +Today's admin UI accepts uploads through Busboy (dataset CSVs, geodataset GeoJSON/MVT, individual mission asset files), capped at 500MB and written to the local filesystem. In AWS the byte path has to end at S3 instead of disk. + +**Decision:** Switch the UI upload path to presigned browser-to-S3. The admin server hands back a presigned URL; the browser PUTs the file directly to S3. + +**Why:** Through-server upload (admin receives bytes, writes them to S3) pins upload bandwidth to the admin service and risks timeouts on multi-GB files. Presigned lifts the bytes off the admin entirely and is the standard pattern for browser-to-S3 in AWS. + +#### (b) Tile pyramid workflow + +Today, mission operators handle big raw imagery on their workstation: run a GDAL script, get a tile pyramid (a folder of thousands of small tile images), then `scp` the folder into MMGIS's `Missions/` directory. The UI is not used. In AWS there's no shared filesystem to `scp` to, and admin users won't have direct AWS credentials — everything goes through the admin UI. + +The question: how does a tile pyramid (thousands of files, many GB) get from a workstation into S3 via the admin UI? Presigned handles one big file fine, but a pyramid is many files. + +**Options:** + +- *Upload as a single archive.* Operator zips the pyramid, uploads the archive via presigned, a backend task extracts it back into S3. One operator action; reintroduces a backend step in the upload path. +- *Bulk multi-file upload.* Browser fires off many presigned uploads in parallel. Works for small pyramids; brittle for big ones (browser memory, dropped connections, no resumability). +- *Shift the production format to COGs.* A Cloud-Optimized GeoTIFF is one file containing the whole pyramid; TiTiler serves tiles from it on demand. Operators run `tifs2cogs` (already in `auxiliary/stac/`) instead of `gdal2customtiles`. One file, standard upload. Requires migrating existing tile-pyramid layers in mission configs. + +**Open:** Q-BIG-UPLOAD. Once the workflow is settled, the per-file size cap follows from it and is a deploy-time config value. + +### 2.5 Authentication + +The auth model is unchanged from today: local accounts with Postgres-backed sessions, optional CSSO. Production runs `local` by default, or `csso` when upstream SSO is required. + +One bootstrap concern under `local`: a fresh deploy with no users exposes a first-signup endpoint that grants superadmin to whoever hits it first — a public-internet race. Doesn't apply to `csso` (identity comes from upstream). + +**Decision:** Seed the first superadmin in the init task using credentials injected as env vars at task launch from AWS Secrets Manager, GitHub Actions secrets, etc. + +## 3. Dashboard infrastructure + +Each published dashboard is one mission's frozen frontend, deployed to its own AWS footprint. Dashboards have no backend of their own — only what the admin stack and sidecars offer over the network. + +Dashboards are strictly one-mission-per-deploy: a single published dashboard always loads exactly one frozen mission, with no mission-picker UI and no `?mission=` switching. If a use case calls for "the same map app showing several missions," that's several dashboards, each published independently. This matches how the publish flow (§5) is described (one mission read, one bundle built, one set of AWS resources provisioned) and removes a class of cross-mission state questions from the dashboard codebase. + +### 3.1 Per-dashboard resources + +Each dashboard gets: + +- **One S3 bucket.** The JS bundle, the baked mission config, and any per-dashboard baked data (small GeoJSON, small CSV, etc.). +- **One CloudFront distribution** in front of the bucket. Default behavior: serve the SPA shell for unknown paths. Static assets cache aggressively; the baked config is fingerprinted and immutable. +- **One CloudFront Function** as the password gate, attached to the viewer-request event. Browser basic auth, checked at the CDN edge. +- **One DNS record** pointing the chosen subdomain at the distribution. + +No backend, no database, no per-dashboard sidecar. + +### 3.2 What dashboards read at runtime + +For each kind of data a dashboard needs: + +- **Mission configuration.** Baked into the JS bundle at publish time. No request. +- **Raster tiles, DEMs, basemap imagery.** Fetched from S3 via CloudFront — usually from the admin's shared S3 bucket. The data already lives there from when admins uploaded it; no per-dashboard copy needed. +- **Small per-mission tabular or vector data.** Baked into the dashboard's own S3 bucket at publish time as JSON or GeoJSON, fetched as a static asset. +- **Larger tabular or vector data.** Queried dynamically from a shared sidecar (TiTiler for raster mosaics, tipg for PostGIS vector tiles, a custom endpoint for tabular search). Dashboards never connect to Postgres directly. + +The dashboard doesn't have to resolve any of this at runtime — every URL it needs is already in the baked mission config. Each layer's URL is rewritten at publish time to point wherever its data actually ended up: an absolute URL into admin's S3, a relative URL into the dashboard's own bucket, or an absolute sidecar URL. The static-vs-dynamic choice only affects which origin serves the bytes, not when they load. + +### 3.3 Authentication + +The gate is a CloudFront Function — a tiny piece of JavaScript that runs at the CDN edge before any request reaches S3, checks an `Authorization` header against a known password, and returns 401 if it doesn't match. The browser handles the password prompt as standard basic auth. What's left to decide is whether all dashboards share one password or each gets its own. + +**Decision:** One shared password across all dashboards, or per-dashboard passwords? + +**Options:** + +- *Single shared password.* One value baked into every dashboard's Function. Trivial to manage; one secret to rotate. But revoking access to a single dashboard means rotating the password for *all* dashboards. +- *Per-dashboard password.* Each distribution's Function is configured with its own password. The main cost is actually managing the passwords for each dashboard. + +**Open:** Q-AUTH-1. + +## 4. URL topology + +Two stakeholder-facing URL choices, independent of each other: + +- What URL shape does each dashboard expose to users? → §4.1. +- What URL shape do dashboards use to reach the sidecars? → §4.2. + +Each is presented as a set of options with the infra each requires. The admin's own URL shape (`/api`, `/configure`, `/Missions/*`, etc.) stays as today regardless of either choice and is not a stakeholder question. + +**Today's URL discipline, for context.** Current MMGIS is single-origin, everything path-prefixed under optional `ROOT_PATH`: `/` (map app, mission via `?mission=` query param), `/api/*`, `/configure`, `/stac`, `/tipg`, `/titiler`, `/titilerpgstac`, `/veloserver`, `/Missions/*`, `/docs/*`. Missions today are *application state*, not URL routing — the `?mission=` query param picks which mission's config is loaded against the same host and paths. Dashboards are the first time mission identity would land in the URL structure itself. + +### 4.1 Dashboard URL shape + +How does each published dashboard look in a browser address bar? + +#### Per-dashboard subdomain + +``` +dash-a.example.com/ +dash-b.example.com/ +``` + +Infra: + +- One CloudFront distribution per dashboard. +- One Route 53 record per dashboard. +- TLS: one wildcard `*.example.com` in ACM, or per-subdomain certs. +- Per-dashboard auth (§3.3): one CloudFront Function per distribution with the dashboard's password baked in. +- Cache invalidations, access logs, behaviors: independent per dashboard. +- Publish flow (§5) creates a fresh distribution per publish; Delete tears one down. + +#### Path-routed under one host + +``` +dashboards.example.com/dash-a/ +dashboards.example.com/dash-b/ +``` + +Infra: + +- One CloudFront distribution for all dashboards. +- One DNS record, one TLS cert. +- Behaviors: one per dashboard, routing `//*` to that dashboard's S3 origin. +- Per-dashboard auth: one CloudFront Function on the shared distribution that dispatches on path prefix to look up the right password. +- Cache invalidations share one quota; access logs mix dashboards (filter by path). +- CloudFront behavior limit: 25 default, raisable; every dashboard adds at least one behavior. +- Publish flow (§5) mutates the shared distribution's behaviors and origins rather than creating a new distribution. + +#### Paths under the admin host + +``` +admin.example.com/dashboards/dash-a/ +admin.example.com/dashboards/dash-b/ +``` + +Infra: + +- Reuses the admin CloudFront. Behaviors added per dashboard. +- Auth posture mixes: admin requires session login, dashboards require a different password gate. Both on the same hostname. +- Mentioned for completeness; awkward in practice because admin (session-gated, admin-only) and dashboards (password-gated, end-user-facing) have different security postures. + +**Open:** Q-URL-DASHBOARD. + +### 4.2 Sidecar URL shape + +When a dashboard's frontend fetches a tile, a STAC search, or a vector layer, what URL does it call? + +#### Per-sidecar subdomain + +``` +titiler.example.com +stac.example.com +tipg.example.com +veloserver.example.com (if deployed) +``` + +Infra: + +- One DNS record per sidecar (3–4 records). +- TLS: one wildcard `*.example.com`, or per-subdomain certs. +- ALB listener rules routing by Host header to existing per-sidecar target groups. +- CORS configured per sidecar (response policy on the ALB or on the sidecar service), allowing dashboard origins. +- Optionally one CloudFront distribution per sidecar for tile caching. + +#### Path-routed on the admin host + +``` +admin.example.com/titiler/ +admin.example.com/stac/ +admin.example.com/tipg/ +admin.example.com/veloserver/ +``` + +Infra: + +- Admin's CloudFront/ALB gains behaviors for `/titiler/*`, `/stac/*`, etc. Sidecar target groups already exist. +- No new DNS records, no new TLS items. +- CORS on the admin CF/ALB allowing dashboard origins. +- Most continuous with today's shape — today's Express proxy already maps these paths to the same sidecars. +- All sidecar traffic flows through admin's CF; if admin is sized for low traffic, sidecars push it harder. + +#### Path-routed on a dedicated sidecar host + +``` +services.example.com/titiler/ +services.example.com/stac/ +services.example.com/tipg/ +``` + +Infra: + +- One new DNS record + one TLS cert. +- One new CloudFront distribution (or just ALB) in front of the sidecar target groups. +- CORS on the dedicated host allowing dashboard origins. +- Isolates sidecars from admin's CF/ALB; consolidates them under one host. + +**Open:** Q-URL-SIDECAR. + +### 4.3 Cross-origin sidecar auth gate + +Dashboards reach sidecars cross-origin in every §4.2 option (admin reaches them internally; no change there). Today's admin server proxy wraps each sidecar in `ensureAdmin` for writes; dashboards bypass that proxy, so the gate has to come from somewhere. + +Options: + +- *Password gate alone.* Once the dashboard is loaded (past the CloudFront Function password), sidecar requests are unauthenticated but reachable. Simple; assumes nothing else on the internet stumbles onto the sidecar URLs. +- *CORS allow-list.* Restricts in-browser access to dashboard and admin origins. Browser enforces. Does not stop direct `curl`. +- *Signed requests.* CloudFront signs requests to the sidecars (Lambda@Edge, or short-lived credentials baked at publish time). Properly secures against direct access; more frontend work (see [ADR-B §3.3](./adr-b-frontend-refactor.md#33-cross-origin-sidecar-auth-from-dashboards-frontend-side)). + +These combine — CORS allow-list + password gate, or CORS + signed requests, etc. The right combination depends on the security posture stakeholders accept on the sidecar services themselves. + +**Open:** Q-AUTH-2. + +## 5. Publish flow + +The new code path: an admin clicks **Publish** in the admin tool. What happens: + +1. **Admin tool → admin server.** The publish request, with mission, dashboard name, and settings. +2. **Admin server → bundling task.** Reads the mission's current config from Postgres. For each layer the mission references, decides where the data will live (baked into the dashboard's bucket, left in admin's S3, or served by a sidecar) and rewrites the layer's URL in the baked config accordingly. Builds the dashboard's frontend bundle with the rewritten config frozen in. Emits a directory of bundle plus baked static assets. +3. **Admin server → provisioning.** Creates the per-dashboard S3 bucket, CloudFront distribution, password-gate Function, and DNS record. +4. **Admin server → upload + invalidate.** Uploads the bundle to the new bucket and issues a CloudFront invalidation so users see the new build immediately. +5. **Admin server → admin tool.** Returns the dashboard URL; the admin tool surfaces it and records it in the dashboards registry table. + +A matching **Delete Dashboard** path reverses every step: invalidate CloudFront, delete distribution, delete Function, delete bucket, remove DNS record, remove registry row. + +### 5.1 Where the bundling task runs + +The bundling task in step 2 is a real compute job — it reads from the database, runs Webpack, and produces a directory tree. + +**Decision:** How does the bundling task run? + +**Options:** + +- *In-process in the admin task.* Simplest; ties up the admin's compute during a build; bundle size bounded by the admin container's filesystem and memory. +- *Spawned ECS task per publish.* A fresh container per build, isolated from the admin. Clean lifecycle, predictable footprint. Cold-start latency is **tens of seconds** dominated by image pull (20–60s without optimization; sub-5s with SOCI lazy loading) — fine for a publish flow whose total time is dominated by Webpack anyway. +- *CodeBuild job triggered by the admin.* AWS-native CI primitive with free logging and build artifacts. Both are negligible wins here — artifacts go to S3 either way and logs go to CloudWatch either way. Adds an external surface to manage. + +**Recommended:** Spawned ECS task per publish. + +**Why:** Clean lifecycle, predictable resource footprint, no contention with the admin's serving load. + +### 5.2 How per-dashboard resources are provisioned + +**Decision:** How do we provision the per-dashboard resources? + +**Options:** + +- *CDK or CloudFormation template, deployed from the admin task.* Declarative, idempotent, easy to tear down. Requires a large IAM surface on the admin's role. +- *Direct SDK calls.* Imperative, simpler IAM (scoped to exactly what the calls touch). Teardown is custom code. +- *Step Functions orchestration.* Overengineered for this. Defer. + +**Recommended:** Direct SDK calls from the spawned bundling task. + +**Why:** Tight IAM scope; teardown is straightforward when paired call-for-call with creation. + +### 5.3 Dashboards registry + +The admin tracks every dashboard it has published — at minimum URL, name, owner, and provisioning metadata — in a registry table on the shared Postgres (§2.2). Used to list dashboards in the admin UI, gate Delete Dashboard, and know which CloudFront distributions to invalidate on republish. + +## 6. Shared services and isolation + +The default position is **shared** — one resource serving many dashboards — and we deviate only when isolation is a hard requirement. + +**Sidecars.** One deployment of each sidecar, shared across the admin and every dashboard. Per-dashboard deployments are rejected: cost (N copies of each Python service running) and management overhead (N deployments to upgrade) aren't justified given the services are stateless or read from shared databases. + +**Veloserver is the exception worth flagging.** Its requirements are under-documented, and no frontend code references it today. So the live question for AWS is narrower than "deploy it or not": *does any production mission config still reference veloserver-backed layers?* If yes, document what the service needs; if no, drop. Tracked as Q-VELO. + +**Per-dashboard database isolation.** Rejected. The operational cost (N instances to patch, monitor, back up) and the security surface (each dashboard now has database credentials) aren't justified for any need we've identified. Tables get a dashboard-scoped slice on the shared instance only when they need persistence beyond a baked file — rare. + +## 7. Data layout + +### 7.1 The local-files heritage + +MMGIS's storage was always split: **raster files on local disk** under the mission directory; **structured data in Postgres** (tabular datasets, PostGIS geodatasets, mission configs, drawings, sessions). The AWS deployed world has no shared local disk, so: + +- **Raster files → S3**, same prefix layout. The relative-path resolver in mission configs points at the S3 prefix instead of the filesystem. +- **Structured data → still Postgres**, now on RDS instead of in a container. +- **No "point at a local path" workflow survives.** Mission configs may not reference absolute filesystem paths; relative paths under the mission folder remain supported. + +### 7.2 Where dashboard data comes from + +A dashboard pulls data from one of three places. The choice isn't really about *size* — S3 can hold anything — it's about **access pattern** (static fetch vs. dynamic query) and **which bucket** holds the bytes: + +- **Static fetch from the admin's S3 bucket.** No copy needed; the data already lives there from when admins uploaded it. The baked mission config points at the existing CloudFront-fronted URL. Right for raster tiles, DEMs, basemap imagery — the big files that already live in admin S3 and would only duplicate if copied per dashboard. +- **Static fetch from the dashboard's own S3 bucket.** Baked at publish time. The publish step reads from admin storage (Postgres rows or admin S3 files), serializes to JSON or GeoJSON, and writes a static file into the dashboard's bucket alongside the JS bundle. Right for *mission-specific* small data — the mission config itself, small lookup tables, baked search indices. Clean deletion lifecycle: drop the dashboard's bucket and its data is gone with it. +- **Dynamic query against a shared sidecar.** The dashboard makes HTTP requests to TiTiler (raster mosaics over big COGs), tipg (PostGIS as vector tiles or OGC Features), or a thin custom endpoint for tabular search. Right when the access pattern is "compute this on demand," not "fetch this file." + +The default position is to push as much as possible into the first two categories (static fetches, no service hop) and use sidecars only for data that genuinely needs dynamic querying. + +**The publish step is therefore a selective data-copying operation.** For each piece of data the mission references, it decides: leave it where it is (admin's S3 or a sidecar) and write the URL into the baked config; or read from admin storage, serialize, and write into the dashboard's bucket. Most missions end up with a mix of all three. + +**Last resort: a dashboard-scoped table in the shared Postgres** plus a thin query endpoint to read it. Only when the dashboard genuinely needs writeable per-dashboard persistence — rare enough that we don't pre-commit a design. + +### 7.3 Open + +**Open:** Q-BAKE-CEILING — how much data can a dashboard load at boot before it feels slow? This is the UX ceiling that decides which data lands in the first two categories (static fetch) vs. the third (sidecar query). Investigation needed; not an ADR-time decision. + +## 8. Open questions + +- **Q-COMPUTE** — Which of the three coupled bundles (Today's shape ported / Express Mode for admin / Subdomain per service) for compute + sidecar routing + admin write gate? → §2.1. +- **Q-BIG-UPLOAD** — How do tile pyramids (thousands of files, many GB) reach S3 via the admin UI? → §2.4(b). +- **Q-AUTH-1** — Per-dashboard password, or one shared password? → §3.3. +- **Q-URL-DASHBOARD** — Dashboard URL shape: per-dashboard subdomain, shared host with path routing, or paths under admin? → §4.1. +- **Q-URL-SIDECAR** — Sidecar URL shape: per-sidecar subdomain, path-routed on admin host, or path-routed on a dedicated sidecar host? → §4.2. +- **Q-AUTH-2** — Cross-origin sidecar gate: password-only, CORS allow-list, signed requests, or a combination? → §4.3. +- **Q-VELO** — Config audit: does any production mission config wire a layer through `/veloserver`? Infra is fully wired but no frontend code references the service; whether to keep it deployed depends on the audit. → §6. +- **Q-BAKE-CEILING** — How much data can a dashboard load at boot before it feels slow? → §7.3. diff --git a/docs/adr/deployment/preserve/adr-b-frontend-refactor.md b/docs/adr/deployment/preserve/adr-b-frontend-refactor.md new file mode 100644 index 000000000..5f9081b18 --- /dev/null +++ b/docs/adr/deployment/preserve/adr-b-frontend-refactor.md @@ -0,0 +1,115 @@ +# ADR-B: Frontend refactor for dashboard mode + +**Status:** Proposed — Under Review +**Date:** 2026-05-19 + +## 1. Scope + +This ADR covers the changes to the MMGIS frontend codebase that make dashboard mode possible: a small set of seams where the runtime branches between "running inside the admin stack" (today's behavior) and "running as a published dashboard" (no backend, no database, no WebSocket). + +AWS infrastructure decisions — admin compute, URL topology, the publish flow, sidecar hosting — are in [ADR-A](./adr-a-aws-deployment.md). Per-feature drop/survive disposition with implementation notes is in [`features.md`](./features.md). The stakeholder-given intent and requirements are in the [overview](./overview-new.md). + +The high-level shape: dashboard mode is selected by a build-time flag. The codebase has one branch; the bundle is built twice (once for admin, once per dashboard) from the same source. Almost the entire frontend is unchanged in dashboard mode — the map engines, the tools, the chrome, and the embed API all run as-is. The work is concentrated at the five seams in §2. + +## 2. The five seams + +### 2.1 Freezing the mission configuration into the bundle + +The frontend currently boots by asking the server for its mission configuration. A dashboard has no server to ask, so the configuration must already be inside the bundle. + +MMGIS already runs a small pre-bundle script (`API/updateTools.js`) that writes out generated JavaScript files — today it lists the installed tools and components. We add one more generated file — the frozen mission configuration — that the frontend imports like any other source. The dispatcher (§2.2) returns this baked config from its `bake` branch when the call site asks for it. + +### 2.2 Replacing the frontend's calls to the server + +Every named call the frontend makes to MMGIS's backend flows through one dispatcher function. That dispatcher already has an unused if-branch for "what if there's no server?" — wired but never triggered today because a flag is hard-coded to server-mode. Dashboard mode flips the flag and fills the branch with a per-call lookup table: + +- **Bake.** Answer known at build time, written into the bundle. Just return it. +- **Reroute.** Call one of the shared sidecars directly instead. +- **Compute.** Answer in the browser using baked-in data. +- **Drop.** This call doesn't make sense in a dashboard (e.g. drawing-write, login). Return an error gracefully. + +Because every call goes through one dispatcher, this is one function and one table — not a sweeping edit. See §3.1 for the decision rationale. + +### 2.3 Telling the frontend where the sidecars live + +The frontend currently builds URLs to the sidecars as same-origin paths like `/titiler/...`, relying on MMGIS's server to forward them behind the scenes. A dashboard has no server to forward through; it needs the services' real public addresses. + +The dashboard-mode change is a helper that returns the right URL base for the build mode — same-origin paths in admin mode (no behavior change), absolute URLs in dashboard mode. There are nine sites today across five files (`Map_`, `Layers_`, `LayersTool`, `IdentifierTool`) that build these URLs by inline string interpolation; the work is centralizing them into the helper, then flipping mode by build flag. + +The exact URL shape returned by the helper in dashboard mode depends on the choice in [ADR-A §4.2](./adr-a-aws-deployment.md#42-sidecar-url-shape) — per-service subdomains, or a single fronted CloudFront. The helper's interface is the same either way; the format string changes. + +### 2.4 Handling backend-only computations + +MMGIS's backend has a few small utility endpoints that do work for the frontend (elevation profiles, projection conversions, image-band metadata). A dashboard has no backend, so each is handled individually: drop the feature, redirect to a sidecar, or move the math into the browser. + +Per-feature product decisions, not a mechanical rewrite. Dispositions live in [`features.md`](./features.md). + +### 2.5 Disabling server-dependent features + +Two features have nowhere to go in a dashboard and just turn off: + +- **The login form.** Dashboards have no accounts. The login modal and its associated UI never render. The auth state is implicitly "anonymous, read-only forever." +- **The live-update WebSocket.** Three consumers in admin (real-time Draw collaboration, layer-update notifications from the admin tool to open map sessions, and admin-tool-to-admin-tool multi-admin coordination). All three drop in dashboards; the admin stack keeps all three. No connection attempted; no fallback needed. + +## 3. Open architectural decisions + +The five seams describe the refactor's shape. Three decisions inside the refactor aren't yet settled. + +### 3.1 How the API-call dispatcher branches in dashboard mode + +**Decision:** Replace the dispatcher's no-server early-return with a per-call disposition table (bake / reroute / compute / drop). Every call site keeps calling `api('whatever')` unchanged. + +The dispatcher is the chokepoint by construction — roughly 40 named calls, roughly 30 importing files, all going through one function. The alternative ("branch each call site") would mean editing every importer to wrap calls in `if (dashboardMode)`. Mechanical churn for no architectural benefit; the chokepoint is exactly the right seam. + +### 3.2 Time-composited layers in dashboards + +Some mission configs use a URL convention that triggers server-side compositing of time-windowed map tiles — the server reads several tiles at different timestamps, blends them, and returns one tile. A dashboard has no server to do that compositing, and the compositing step isn't free. + +**Decision:** What happens to time-composited layers in dashboards? + +**Options:** + +- *Pre-bake every time slice at publish time.* The publish step composites every possible time window in advance and stores the results as static tiles in S3. Works, but storage cost scales with how many time windows the layer supports. +- *Hide the layer in the dashboard.* The layer simply doesn't appear in dashboards that don't pre-bake it. Cheapest; loses the feature for that layer. + +**Recommended:** Per-layer decision rather than a global default. + +**Why:** Some layers are critical to the mission and worth the bake cost; others are decorative and can be hidden. Marking the disposition per layer in the mission config is cheaper than picking one global rule. + +**Open:** Q-TIME. + +### 3.3 Cross-origin sidecar auth from dashboards (frontend side) + +The dashboard's frontend makes cross-origin requests to the sidecars. Whether it needs to attach auth credentials depends on the gate ADR-A chooses (see [ADR-A §4.3](./adr-a-aws-deployment.md#43-cross-origin-sidecar-auth-gate)). + +How the frontend side breaks under each ADR-A option: + +- **Password gate alone:** the dashboard's frontend attaches nothing — the edge password gate on the dashboard's CloudFront has already established that the user is authorized to be there. Sidecar requests go out unauthenticated. +- **CORS allow-list:** same as above on the frontend side. The browser enforces the origin check at request time. +- **Signed requests:** the frontend has to attach a signature or token to each sidecar request. Requires either short-lived credentials baked at publish time, or a way for the dashboard to obtain credentials at boot. + +**Disposition:** Follow ADR-A's decision. The frontend work is minimal under the first two options, real (a few hundred lines, a credentials-fetch flow) under the third. Q-AUTH-2 lives in ADR-A; the implementation here depends on the answer. + +## 4. Per-feature disposition summary + +The per-feature drop/survive matrix lives in [`features.md`](./features.md). The shape: + +- **Most features survive as-is** in dashboards — pure-client tools, map viewports, time control, URL state, DEM-reading tools, the embed API. +- **A defined set drops cleanly** — login, the three WebSocket consumers, the Configure admin tool, accounts / tokens / permissions, webhooks, file uploads, the sidecar proxy, the server-only utility routes, the Jekyll docs site. +- **A smaller set is conditional** on open questions in §5 — drawing, dashboard search, time-composited layers, the Isochrone heavy-compute tool. +- **Mission picker collapses** to "load the baked mission" since dashboards are one-mission-per-deploy. + +Per-row implementation notes — including drop reasons and bake/reroute/compute details — live in `features.md`. + +## 5. Open questions + +Frontend-scope questions tracked in this ADR: + +- **Q-DRAW** — Drawing in dashboards: drop, read-only display of baked features, or local-storage edit mode? +- **Q-SEARCH** — Dashboard search: client-side baked index, routed through tipg, or a shared search endpoint in the admin stack? Per-dashboard scoping (one dashboard can't discover another's data) is part of the answer either way. +- **Q-TIME** — Per-layer disposition for time-composited layers in dashboards. → §3.2. + +Cross-cutting questions affecting this ADR but owned by ADR-A: + +- **Q-AUTH-2** — Cross-origin sidecar auth gate. Owned by [ADR-A §4.3](./adr-a-aws-deployment.md#43-cross-origin-sidecar-auth-gate). The frontend implementation in §3.3 depends on the answer. +- **Q-URL-SIDECAR** — Sidecar URL shape (per-sidecar subdomain, path on admin host, or path on a dedicated sidecar host). Owned by [ADR-A §4.2](./adr-a-aws-deployment.md#42-sidecar-url-shape). The URL helper in §2.3 builds whichever shape ADR-A chooses. diff --git a/docs/adr/deployment/preserve/detailed-implementation-plan.md b/docs/adr/deployment/preserve/detailed-implementation-plan.md new file mode 100644 index 000000000..bdc7db374 --- /dev/null +++ b/docs/adr/deployment/preserve/detailed-implementation-plan.md @@ -0,0 +1,1297 @@ +# Detailed implementation plan: AWS deployment & admin/dashboard split + +> Companion to `adr.md`. This document is **not for human reading start-to-finish**. +> It is the dense detail layer the ADR depends on. Its purpose is to (a) ground every +> claim in the ADR in concrete code, (b) give a downstream LLM a high-resolution map +> to either execute or review, and (c) surface contradictions back to the ADR. +> +> If you find this document contradicts the ADR, **the ADR wins** and this plan +> gets the correction. If you find this document contradicts the code, **the code +> wins** and the ADR may need rework. +> +> **Code references use files and function names. No line numbers.** Line numbers +> rot every time someone else lands a change. + +## 0. How to read this document + +The plan is split into phases. Phases are ordered for execution but reviewable +out of order: + +- **Phase A — Code preparation.** No behavioral change. Introduces helpers and + flags that later phases use. +- **Phase B — Adjacent-service URL indirection.** Mechanical call-site rewrite. + Admin behavior unchanged because the helper returns same-origin paths today. +- **Phase C — Boot-time config injection.** Replaces the boot fetch with a + baked import when `STATIC_MODE=true`. +- **Phase D — Static build pipeline.** New script + Webpack branches. +- **Phase E — Feature gating in static.** Per-tool drop / degrade behavior. +- **Phase F — Mission asset S3 migration.** Both admin (middleware fetches from + S3) and dashboards (bake step rewrites relative paths to absolute). +- **Phase G — Adjacent services on ECS.** Container images, task defs, ALB target + groups, CORS. +- **Phase H — Provisioning code.** The Publish-button → S3+CloudFront flow. +- **Phase I — Dashboard registry.** New table + endpoints + UI surface. +- **Phase J — Deploy-time gaps.** First-user gap closure, CloudFront Function + password gate. + +Within each phase: **Goal**, **Files touched**, **Specific changes**, **Verification**, +**Rollback**. + +## Source-of-truth code references + +These are the load-bearing files the plan keeps coming back to. Verified during +research; cite the path, not a line range, when reasoning about behavior. + +### Backend + +- `scripts/server.js` — composition root. Express assembly, session config, + helmet, CSP, body parser ordering, `cssoHandler` middleware definition, ALB + health endpoint registration, WebSocket attachment, sidecar proxy mount, ROOT_PATH + prefix handling. +- `scripts/init-db.js` — Postgres bootstrap. Creates `mmgis` and `mmgis-stac` + databases; installs `postgis`, `btree_gist`, `pgstac` extensions; creates the + session table and indexes. +- `scripts/build.js` — production frontend build entrypoint. Imports + `configFactory("production")` from `configuration/webpack.config.js`, runs + `updateTools()` and `updateComponents()` from `API/updateTools.js` before + Webpack, then drives the build. +- `scripts/middleware.js` — `missions()` function. Static-file serving for + `/Missions/...` with path-traversal hardening and `_time_` composite handling + via `sharp`. The S3 migration in Phase F lives here. +- `API/setups.js` — feature-module loader. Iterates `API/Backend//` + directories and any `*Plugin-Backend*` / `*Private-Backend*` siblings, invoking + each module's `setup.js`. +- `API/connection.js` — Sequelize connection. Single shared instance. +- `API/database.js` — pg-promise connection. Used only by Draw. +- `API/websocket.js` — WebSocket server. `ws.Server({ noServer: true })` attached + to HTTP upgrade. No rooms; broadcast bus. +- `API/updateTools.js` — `updateTools()` and `updateComponents()` codegen. + Writes `src/pre/tools.js`, `src/pre/components.js`, `configure/public/toolConfigs.json`, + `configure/public/componentConfigs.json`. The Phase C extension hooks here. +- `API/Backend/Users/models/user.js` — user model, bcrypt password hashing, + permission code field, missions_managing array. +- `API/Backend/Users/routes/users.js` — `first_signup`, `login`, `logout`, + `signup` handlers. The first-user-becomes-superadmin logic lives in `first_signup`. +- `API/Backend/Accounts/routes/accounts.js` — `/api/accounts/*`, account CRUD, + permission update. +- `API/Backend/Config/routes/configs.js` — `/api/configure/*`. The + `get_generaloptions`, `missions`, `get` endpoints feed the boot path. + `checkMissionPermission` checks per-user `missions_managing` against the + requested mission. +- `API/Backend/Datasets/routes/datasets.js` — `/api/datasets/upload`. Streams + CSV in 10000-row chunks (`maxRowsAtATime`), disables timeout + (`req.setTimeout(0)`). +- `API/Backend/Geodatasets/routes/geodatasets.js` — `/api/geodatasets/upload`. + Streams GeoJSON to PostGIS dynamic tables. +- `API/Backend/Draw/routes/files.js`, `API/Backend/Draw/routes/filesutils.js` + — `user_features`, `user_files` tables, owner + public='1' visibility logic. +- `adjacent-servers/adjacent-servers-proxy.js` — `http-proxy-middleware` + mounting `/stac`, `/tipg`, `/titiler`, `/titilerpgstac`, `/veloserver`. + Each block wrapped in `ensureAdmin(false, false, true)` — anon GETs pass, + mutations admin-gated. `isDocker` swaps `localhost`/Compose-service-name. + `createSwaggerInterceptor` rewrites upstream OpenAPI docs. +- `docker-compose.yml`, `docker-compose.dev.yml`, `docker-compose.db.yml` — + service inventory, profile flags (`--profile stac`, `--profile veloserver`). +- `sample.env` — canonical list of MMGIS env vars including `AUTH`, the + `WITH_*` flags, `ADJACENT_SERVER_CUSTOM_`. + +### Frontend + +- `src/index.js` — React root render. Mounts `App` into DOM. +- `src/App.js` — boot path. Has 4 `calls.api` calls total: `get_generaloptions` and `missions` are the two config-related ones; the other two are `shortener_expand` (only fire on `?s=…` shortened URLs). Note: the mission-config fetch (`calls.api('get', { mission })`) does **not** live in App.js — it lives in `essence.js` and `LandingPage.js` (see below). +- `src/pre/calls.js` — **single chokepoint** for every named API call in the Essence bundle. Holds the `c[]` table mapping ~30 named endpoints to URL paths. The `api()` function already has a dormant `SERVER != 'node'` escape branch (today: warns + calls error). The static-mode refactor hooks here. +- `src/pre/tools.js`, `src/pre/components.js` — codegen output, gitignored, + re-imported by the bundle. +- `src/essence/essence.js` — `essence.init(configData, missionsList)`. Calls `L_.init(configData, ...)`. Has 2 `calls.api('get', { mission })` sites (`makeMission` and `swapMission` paths). Injects `_dbMissionName` into `configData` from API response. +- `src/essence/Basics/Layers_/Layers_.js` — `L_` global singleton. `L_.init` + calls `parseConfig(configData)` to populate `L_.data`, `L_.dataFlat`, + `L_.layer`, `L_.on`, `L_.opacity`, `L_.filters`, `L_.nameToUUID`. Holds + `L_.missionPath`, `L_.missionFolderName`. Defines `L_.onceLoaded(cb)`. **Has hardcoded same-origin sidecar URL construction** (one of the four files for Phase B). Also contains the `getSTACLayers` recursion that the bake step must mirror. +- `src/essence/Basics/Map_/Map_.js` — Leaflet + deck.gl glue. **Has hardcoded same-origin sidecar URL construction** (one of the four files for Phase B). +- `src/essence/Basics/Globe_/Globe_.js`, `GlobeRenderer.js` — Cesium glue. **Verified: does NOT construct sidecar URLs directly.** Consumes layer configs from `L_`. Listed here only to note that earlier draft of this plan wrongly included it in the Phase B inventory. +- `src/essence/Basics/MapEngines/IMapEngine.ts`, `MapEngineRegistry.ts` — + the engine abstraction the dual 2D/3D rendering goes through. +- `src/essence/Tools/Identifier/IdentifierTool.js` — point queries to `/titilerpgstac/collections/…` and `/titiler/cog/point/…`. **Has hardcoded same-origin sidecar URL construction** (one of the four files for Phase B). Also calls `calls.api('getbands', …)` — this hits the **backend** route `/api/utils/getbands`, NOT a sidecar; covered as a backend-route-disappearance, not a URL-helper rewrite. +- `src/essence/Tools/Layers/LayersTool.js` — vector tile, STAC, tipg. **Has hardcoded same-origin sidecar URL construction** (one of the four files for Phase B). Also calls `calls.api('proj42wkt', …)` — backend route `/api/utils/proj42wkt`, NOT in the adjacent-servers proxy. +- `src/essence/Tools/Draw/` — drawing tool. WebSocket + REST writes against + `/api/draw/*`. +- `src/essence/Tools/Measure/MeasureTool.js` — elevation profile uses `calls.api('getprofile', …)`. **Verified: not a direct TiTiler URL** — `getprofile` is the backend route `/api/utils/getprofile`, which internally may delegate to TiTiler. In static, the backend route disappears; the feature needs a per-disposition decision (call TiTiler directly cross-origin, replace with client-side computation over baked DEM tiles, or hide). +- `src/essence/Ancillary/Search.js` — server-side search UI. Calls into `/api/datasets/search` (Express, not a sidecar). **Verified path: `Ancillary/`, not `Tools/`.** +- `src/essence/Basics/TimeControl_/TimeControl.js`, `TimeUI.js` — time-control UI. **Verified path: `Basics/TimeControl_/`, not `Tools/TimeControl/`.** Calls `query_tileset_times` server-side; static needs the time list baked into the config. +- `src/essence/Basics/Layers_/LayerCapturer.js` — **Verified path: `Basics/Layers_/`, not `Tools/Identifier/`.** Has un-guarded boot fetches per the v3 plan; check for these during Phase C work. +- `src/essence/LandingPage/LandingPage.js` — mission picker. `.init(generalOptions, missionsList)`. Has 2 `calls.api('get', { mission })` sites that fire after a mission is selected. Also injects `_dbMissionName`. +- `src/essence/Ancillary/Login/Login.js` — login UI. Hidden in static. + +### Static-mode-relevant existing infrastructure (dormant but present) + +- `public/index.html` — sets `mmgisglobal.SERVER = "node"` **unconditionally** (outside the `NODE_ENV` switch). The dual-render (Pug `#{}` for production, InterpolateHtmlPlugin `%%` for default) of the switch is the natural place to add a third static-mode branch that sets `SERVER` differently. +- `src/pre/calls.js` line ~169 — the dormant `SERVER != 'node'` escape branch (currently warns + errors). +- `src/essence/essence.js` (`swapMission`) — non-node branch that does `$.getJSON('Missions//config.json')`. Currently unreachable. +- `src/essence/LandingPage/LandingPage.js` (3 sites) — non-node branches that load config from `Missions//config.json` directly. Currently unreachable. +- `FORCE_CONFIG_PATH` env hook — plumbed through `scripts/server.js` → `public/index.html` → consumed in `src/App.js`. When set, the landing page skips the missions-list API call and loads config from that path. + +### Build/config + +- `configuration/webpack.config.js` — Webpack 5 config. `entry` points at + `src/index.js`. `output.path` is `build/`. `HtmlWebpackPlugin` produces + `build/index.html`. `MiniCssExtractPlugin`, `CopyWebpackPlugin` (Cesium + assets), `DefinePlugin` (env var injection via `getClientEnvironment`). + `ModuleScopePlugin` restricts imports outside `src/` — relevant when + introducing the baked-config alias. +- `configuration/env.js` — `getClientEnvironment()`. The env-var allow-list: + `REACT_APP_*` plus a curated MMGIS list. New env vars for static mode must + be added here to reach the browser. +- `configuration/paths.js` — path constants used by the build (paths.appSrc, + paths.appBuild, paths.appPublic). +- `configuration/modules.js` — module resolution config. +- `configure/package.json` — Configure SPA. React 17 + react-scripts + + MUI 5 + Redux Toolkit. Builds with `react-scripts build`. +- `configure/scripts/make-pug-index.js` — wraps CRA's `index.html` into a + pug template Express can render with injected variables (user, permission, + AUTH mode, etc.). +- `public/index.html` — HTML template processed by HtmlWebpackPlugin. Contains + `%REACT_APP_*%` placeholders. + +--- + +## Phase A — Code preparation + +**Goal:** Lay the foundations the later phases need without changing runtime +behavior. After Phase A, `npm run build` and `npm start` work identically. + +> **Open decision before Phase A starts:** Do we introduce a fresh +> `STATIC_MODE` env var (recommended below for clarity), OR reuse the existing +> `mmgisglobal.SERVER` flag by setting it to `"static"` in the static build's +> `public/index.html` render branch? `STATIC_MODE` is cleaner for the +> build-time DefinePlugin substitutions and Webpack tree-shaking; reusing +> `SERVER` activates the existing dormant non-node code branches in +> `calls.js`, `essence.js`, and `LandingPage.js` for free. Recommended: use +> **both** — `STATIC_MODE` as the build-time flag for Webpack +> DefinePlugin / DCE, and set `mmgisglobal.SERVER = "static"` in the static +> `index.html` so the dormant branches activate at runtime. Pin during +> execution. + +### A.1 Introduce `STATIC_MODE` and `STATIC_*` env vars in the allow-list + +**File:** `configuration/env.js` (`getClientEnvironment`). + +Add to the curated allow-list: + +- `STATIC_MODE` — string, `"true"` or unset. +- `STATIC_CONFIG_PATH` — string, optional; path to the baked config JSON + emitted by the static publish step. Defaulted by `scripts/publish-static.js`. +- `STATIC_TITILER_URL`, `STATIC_STAC_URL`, `STATIC_TIPG_URL`, + `STATIC_TITILER_PGSTAC_URL`, `STATIC_VELOSERVER_URL` — absolute URLs of + shared admin-stack adjacent services in static mode. +- `STATIC_MISSION_NAME` — the mission baked into the dashboard. + +**Verification:** `npm run build` still produces a working admin bundle (the +flag is unset). Inspect `build/static/js/main.*.js` for the new env vars +appearing in the `process.env` shim — they should be `undefined` in the admin +build. + +### A.2 Introduce the service-URL helper + +**New file:** `src/essence/Basics/serviceUrls.js`. + +**Exports:** `getTitilerBaseUrl()`, `getStacBaseUrl()`, `getTipgBaseUrl()`, +`getTitilerPgstacBaseUrl()`, `getVeloserverBaseUrl()`. Each returns a string. + +**Body:** Reads `process.env.STATIC_*_URL` when `process.env.STATIC_MODE === 'true'`, +otherwise returns the same-origin path it returns today (e.g. `/titiler` for +TiTiler). No trailing slash. Result is memoized. + +**Verification:** In the admin build, every call to the helper must return +the same string as the current hardcoded path. Unit test in +`src/essence/Basics/serviceUrls.test.js` covering both branches. + +### A.3 Introduce the baked-config module stub + +**New file:** `src/pre/staticConfig.js`. Gitignored alongside `src/pre/tools.js`. + +**Body in admin (stub) form:** + +```js +export default null; +export const STATIC_MODE = false; +``` + +**Body when emitted by static publish (later, Phase D):** populated with +`{ configData, missionsList, generalOptions, mission }`. + +**`API/updateTools.js`** writes this file at the same time it writes +`src/pre/tools.js`. In the admin case it emits the stub form. The Phase D +work overrides this when `STATIC_MODE=true`. + +**Webpack alias:** `STATIC_MISSION_CONFIG -> src/pre/staticConfig.js`. Must +live under `src/` because of `ModuleScopePlugin` (`configuration/webpack.config.js`). + +**Verification:** `npm run build` produces a build whose `staticConfig.js` +import resolves to the stub. Bundle behavior unchanged. + +### A.4 Introduce the `MODE` constant for runtime branching + +**New file:** `src/essence/Basics/mode.js`. + +**Exports:** `MODE` — string, `'admin'` or `'static'`. + +**Body:** `export const MODE = process.env.STATIC_MODE === 'true' ? 'static' : 'admin';`. + +Anywhere that needs to branch on mode imports `MODE` and compares. Avoids +re-reading `process.env` at every call site. + +--- + +## Phase B — Adjacent-service URL indirection + +**Goal:** Replace every hardcoded `'/titiler'` / `'/stac'` / `'/tipg'` / +`'/titilerpgstac'` / `'/veloserver'` in the frontend with a call to the +Phase A.2 helper. After Phase B, the admin build behaves identically (the +helper returns same-origin paths in admin mode); dashboards become wireable +to absolute URLs by changing env vars. + +### B.1 Inventory call sites + +Verified by `grep '/titiler\|/stac\|/tipg\|/titilerpgstac\|/veloserver'` — **exactly four files** have direct same-origin sidecar URL construction: + +- `src/essence/Basics/Map_/Map_.js` — TiTiler raster layer URLs. +- `src/essence/Basics/Layers_/Layers_.js` — `parseConfig` and the STAC fetch branch (the v3 plan flagged this STAC boot fetch as the most consequential edge case). +- `src/essence/Tools/Identifier/IdentifierTool.js` — `/titiler/cog/point/…` and `/titilerpgstac/collections/…`. +- `src/essence/Tools/Layers/LayersTool.js` — `/titiler/cog/info`, `/titiler/cog/bounds`, vector-tile URLs. + +**Files that earlier drafts incorrectly listed in this inventory** (verified to have NO direct sidecar URL construction): + +- `Globe_.js`, `GlobeRenderer.js` — zero hits; consume layer configs from `L_`. +- `MeasureTool.js` — uses `calls.api('getprofile')` (backend route, not a TiTiler URL). +- `Tools/Identifier/LayerCapturer.js` — doesn't exist at that path; real path is `Basics/Layers_/LayerCapturer.js`. Its un-guarded boot fetches are not sidecar URLs. + +**Validation step (executing agent):** re-run `grep '/titiler\|/stac\|/tipg\|/titilerpgstac\|/veloserver' -r src/essence/` before editing to confirm the four-file list still holds. + +### B.2 Rewrite each call site + +Pattern: where a constructed URL today is `` `/titiler/cog/info?url=${u}` ``, +the new form is `` `${getTitilerBaseUrl()}/cog/info?url=${u}` ``. + +Notes: + +- **Helpers return no trailing slash.** Call sites must add one. +- **Phase B does not cover `getbands`, `getprofile`, or `proj42wkt`** — these are `calls.api(...)` to **backend Express routes** (`/api/utils/getbands`, `/api/utils/getprofile`, `/api/utils/proj42wkt`), not sidecar URLs. They cannot be rewritten through `getTitilerBaseUrl()` because the frontend never constructs a TiTiler URL for them. In static, the backend routes are gone — see Phase E per-feature disposition decisions. + +### B.3 The `adjacent-servers-proxy.js` change for direct-target mode + +If §4.1 of the ADR's "preserve the Express proxy" default holds, no proxy +change is needed in Phase B. The proxy continues to mount `/titiler`, `/stac`, +etc., and the admin frontend keeps hitting same-origin paths. + +If the alternative ("ALB direct routing per service") is adopted, the changes +are in `adjacent-servers/adjacent-servers-proxy.js`: + +- Each `app.use('/titiler', ...)` block becomes optional, gated on a `PROXY_ENABLED` + env var (default true). +- The `ensureAdmin(false, false, true)` wrapping moves to a different mechanism + (Lambda authorizer on the ALB, or service-side basic auth). + +### B.4 Verification + +- Admin `npm start` — Identifier, Measure, Layers, and base map layers all + work. The helper returns same-origin paths and the proxy serves them as + today. +- Adversarial unit spec (Playwright TS unit format — see cross-cutting Tests): temporarily set `STATIC_MODE=true` and `STATIC_TITILER_URL=https://example.invalid`. Confirm that `getTitilerBaseUrl()` returns `https://example.invalid` and that one Identifier code path constructs the correct absolute URL. Reset afterwards. + +### B.5 Rollback + +Phase B is a single mechanical refactor. To roll back, revert the helper +file and the call-site rewrites. No data migration or config impact. + +--- + +## Phase C — Boot-time config injection + +**Goal:** In static mode, fulfill the config-related `calls.api` invocations from a baked source instead of hitting Express. Admin mode unchanged. + +> **Architectural choice for Phase C: stub `calls.api` at the chokepoint vs. branch each call site.** Recommended approach is to stub `calls.api`'s existing non-node branch with a baked-response-map-plus-dispatch (see C.4). The alternative — branching each individual call site on `STATIC_MODE` — would require touching **six sites across three files** (`App.js` x2, `essence.js` x2, `LandingPage.js` x2) for the config path alone, plus the call-site branching for everything else. The chokepoint approach changes one function. *Open question Q-CALLS-API.* + +### C.1 Codegen function (sibling to `updateTools`) + +**File:** `API/updateTools.js`. + +Add an exported function `bakeStaticConfig({ configData, missionsList, generalOptions, mission })` (sibling to the existing `updateTools()` / `updateComponents()` codegens, not an extension — the existing ones are disk-scan with no inputs; this one takes inputs). + +The function writes `src/pre/staticConfig.js` with the form: + +```js +export const STATIC_MODE = true; +export const CONFIG_DATA = /* JSON-serialized config */; +export const MISSIONS_LIST = /* JSON-serialized list */; +export const GENERAL_OPTIONS = /* JSON-serialized options */; +export const MISSION_NAME = /* string */; +// Per-call static-response handlers used by calls.api stub (Phase C.4). +export const STATIC_HANDLERS = { + get_generaloptions: (data, success) => success({ options: GENERAL_OPTIONS }), + missions: (data, success) => success({ missions: MISSIONS_LIST }), + get: (data, success) => success({ mission: MISSION_NAME, config: CONFIG_DATA }), + // shortener_expand, login, etc. — drop with graceful error or omit entirely +}; +export default { configData: CONFIG_DATA, missionsList: MISSIONS_LIST, generalOptions: GENERAL_OPTIONS, mission: MISSION_NAME }; +``` + +In admin builds, `updateTools` writes the stub form (no change to admin). + +`scripts/publish-static.js` (Phase D.1) invokes `bakeStaticConfig` after fetching the live mission config from the admin. + +The bake step must also handle **server-injected fields**: `_dbMissionName` is set into `configData` at runtime by `essence.js` and `LandingPage.js` from `response.mission`. The baked `STATIC_HANDLERS.get` should embed `_dbMissionName` in its response either explicitly (preferred — set it to the source mission name) or accept the `msv.mission` fallback path at `Layers_.js:3916`. Verify the fallback produces the same `L_.mission` value per mission before committing to that path. + +### C.2 Boot-path coverage + +Verified config-related `calls.api` call sites (6 sites across 3 files): + +| File | Calls | +|---|---| +| `src/App.js` | `get_generaloptions`, `missions` | +| `src/essence/essence.js` | 2 × `calls.api('get', { mission })` (makeMission and swapMission paths) | +| `src/essence/LandingPage/LandingPage.js` | 2 × `calls.api('get', { mission })` (init and post-pick fetch) | + +Plus `shortener_expand` (App.js x2, only on `?s=` URLs) and any tool-level `calls.api(...)` per-feature. + +**With the chokepoint approach (C.4 below), none of these six sites get touched.** The stub in `calls.api` handles them all. The alternative per-site branching would require touching each. + +### C.3 LandingPage behavior + +Per the ADR's §5 resolution of Q-LANDING, dashboards are strictly one-mission-per-deploy. When `STATIC_MODE === true`, `LandingPage.init` should short-circuit and call `essence.init(configData, missionsList)` directly without rendering the picker. The picker UI and `?mission=` URL inspection paths are dead code in dashboard mode — leaving them in the bundle is harmless (tree-shaking will likely drop them) but no code path in `STATIC_MODE` should reach them. + +This is a minimal client-side change (one branch in `LandingPage.init`) and is necessary even with the C.4 chokepoint approach, since the *control flow* of "show picker → wait for user click → fetch config" needs to be short-circuited. + +### C.4 The `calls.api` chokepoint stub + +**File:** `src/pre/calls.js`. + +The dormant `SERVER != 'node'` branch is the natural insertion point. Replace the current warn-and-error with a dispatch into `STATIC_HANDLERS` from the baked config module: + +```js +if (window.mmgisglobal.SERVER != 'node') { + const handler = staticHandlers[call]; + if (handler) return handler(data, success, error); + // Calls with no static disposition: drop gracefully. + console.warn('calls.api("' + call + '") not available in static mode'); + if (typeof error === 'function') error(); + return; +} +``` + +Each named call in the `c[]` table gets a per-call decision at bake time: + +- **Bake a static response** — `get_generaloptions`, `missions` (single-element list — the baked mission), `get` (single mission, single response). +- **Reroute to a shared admin-stack service over CORS** — any call that maps to a sidecar (uncommon at this layer since the URL-helper already handles sidecar URL construction; mostly applies to legacy `calls.api` entries that conflate backend and sidecar URLs). +- **Replace with a parameter-aware client-side computation** — small dataset queries that bake to client-side indices. +- **Drop gracefully** — `shortener_expand` (arbitrary tokens, can't bake), `login`, draw writes, etc. + +Per the ADR's resolution of Q-LANDING, `STATIC_HANDLERS.get` is a single baked response, not a map. The bake step asserts that exactly one mission is configured per dashboard build. + +### C.5 The Layers_.js boot-time STAC fetch (v3-flagged edge case) + +**File:** `src/essence/Basics/Layers_/Layers_.js`. + +The v3 plan flagged a synchronous-at-boot STAC fetch in `Layers_.js` (the +`getSTACLayers`-style recursion). This must be guarded by `STATIC_MODE`: + +- In admin mode: unchanged. +- In static mode: the STAC results are either (a) pre-recursed and baked into + the config by `scripts/publish-static.js`, mirroring `getSTACLayers` + semantics at bake time; or (b) fetched at runtime from the shared STAC + service over CORS, requiring the helper from Phase B already to be in + place. + +Default: pre-recurse at bake time. Reasoning: avoids a runtime dependency on +the STAC service for first paint of the dashboard. The bake step must mirror +`getSTACLayers`-style recursion exactly — name this as a load-bearing detail +of `scripts/publish-static.js`. + +### C.6 Verification + +- Admin: every boot path takes the existing fetch route. No console errors + about static. +- Static (after Phase D): bundle loads, mission picker skipped (single mission, baked), `L_.init` runs with `staticConfig.configData`. + +### C.7 Rollback + +Revert Phase C and Phase A.3 / A.4 together. Phase B and lower can stay. + +--- + +## Phase D — Static build pipeline + +**Goal:** Produce a dashboard bundle on demand. Triggered by Phase H's +provisioning code; usable standalone via CLI for testing. + +### D.1 `scripts/publish-static.js` + +**Inputs (CLI flags or env):** + +- `--mission` (string, required). +- `--config` (path to a JSON file, optional; if omitted, the script queries + the admin's Postgres via Sequelize). +- `--output` (path, default `build-static/`). +- `--titiler-url`, `--stac-url`, `--tipg-url`, `--titiler-pgstac-url`, + `--veloserver-url` (strings; baked into `STATIC_*_URL` env vars during the + Webpack invocation). + +**Sequence:** + +1. Resolve mission config (`configData`), missions list, general options. + When called from the admin Publish handler (Phase H), these come from the + handler's RDS query. When called standalone, the script makes an authenticated + admin API call. +2. Recurse any STAC references in `configData.layers` to materialize the layer + tree (Phase C.5 baking). +3. For each layer in `configData`, decide where its data will live: (a) leave + it in the admin's shared S3 bucket (most raster tiles, DEMs); (b) copy it + to the dashboard's own bucket (small per-mission data); (c) point it at + a shared sidecar (large queryable data, COG mosaics, PostGIS-backed + layers). Rewrite each layer's URL in `configData` accordingly (Phase F.2). +4. Call `bakeStaticConfig({ configData, missionsList, generalOptions, mission })` + from `API/updateTools.js` (introduced in Phase C.1). +5. Spawn Webpack with `STATIC_MODE=true`, `STATIC_MISSION_NAME=`, + `STATIC_*_URL=`. Use `configuration/webpack.config.js` unchanged + (env vars flow via `configuration/env.js`). +6. After Webpack succeeds, copy the output directory to `--output`. +7. Restore `src/pre/staticConfig.js` to its stub form so the next admin build + does not accidentally ship a baked config. + +**Concurrency:** The script must hold a build lock (e.g. file lock under +`/tmp/mmgis-static-build.lock`) because it mutates `src/pre/staticConfig.js` +in the working tree. Two concurrent publishes corrupt each other. In ECS +deployment (one task per publish), tasks run in their own filesystem — the +lock is intra-task only. + +### D.2 Webpack changes + +**File:** `configuration/webpack.config.js`. + +- Add the `STATIC_MISSION_CONFIG` alias pointing at `src/pre/staticConfig.js`. +- Confirm `ModuleScopePlugin` does not block (the file lives in `src/`). +- Confirm `DefinePlugin` receives `STATIC_MODE`, `STATIC_*_URL`, `STATIC_MISSION_NAME` + via `configuration/env.js`. +- The `HtmlWebpackPlugin` template `public/index.html` may contain `%REACT_APP_*%` + placeholders that need a static-mode branch. v3 noted an unquoted `%HOSTS%` + substitution gotcha — hardcode `HOSTS = {}` for static builds, or set + `process.env.HOSTS = '{}'` at script invocation. +- Configure SPA (`configure/...`) is **not** built by the static pipeline. It + is admin-only. The static pipeline only runs the Essence webpack. + +### D.3 `package.json` scripts + +Add: + +```json +{ + "scripts": { + "build:static": "node scripts/publish-static.js", + "publish:static": "node scripts/publish-static.js --upload" + } +} +``` + +`--upload` mode adds the post-build S3 sync step. Phase H.4 handles the +production publish; `npm run build:static` is the dev/manual path. + +### D.4 Output layout + +The output of `npm run build:static` is a directory containing exactly what +goes into the dashboard's S3 bucket: + +``` +build-static/ + index.html + asset-manifest.json + static/ + js/ + css/ + media/ + cesium/ + staticConfig.json (also baked into the bundle; emitted separately for inspection) +``` + +S3 sync uploads the whole directory. CloudFront invalidates `/index.html` +and `/asset-manifest.json`; everything else is fingerprinted. + +### D.5 Verification + +- `STATIC_MODE=true npm run build:static -- --mission Test ...` produces a + `build-static/` directory. +- Serve `build-static/` with a static file server (e.g. `npx serve`) and + verify the bundle boots, loads the baked mission, and renders the map. +- Confirm no `/api/configure` network calls in the browser DevTools network + tab during boot. +- Confirm the adjacent-service calls (Identifier, Layers) go to the + configured `STATIC_*_URL` absolute URLs, not same-origin paths. + +**Mission-config source for testing:** There is no example mission config checked into the repo today (verified: `Missions/Demo/` has `Data/` and `Layers/` subdirectories but no `config.json` — real configs live only in Postgres). Three options for getting a test mission config: + +- Export from a running admin's Postgres (`GET /api/configure/get?mission=…` against a dev admin). +- Hand-craft a minimal config that exercises the bundle (also serves as schema documentation). +- Check in `Missions/Demo/config.json` as a permanent test fixture — recommend doing this regardless of the static refactor; useful for smoke tests, useful as schema documentation. + +See Q-MISSION-FIXTURE in the ADR. + +### D.6 Rollback + +Delete `scripts/publish-static.js` and the package.json script entries. +Phases A–C are unaffected. + +--- + +## Phase E — Feature gating in static + +**Goal:** Where features cannot work in a dashboard, either gracefully degrade +or disable. Per `decisions.md` and the ADR §5.3 drop list. + +For each feature: file, branching point, and behavior in each mode. + +### E.1 Draw tool + +- **Files:** `src/essence/Tools/Draw/DrawTool.js` (and submodules). +- **In admin:** unchanged. Reads/writes against `/api/draw/*`; WebSocket + collaboration via `mmgisAPI` event bus → server WebSocket. +- **In static:** **disabled** by default. Implementation: + - In the tool's `make()` (or equivalent registration), short-circuit when + `MODE === 'static'` and the dashboard's baked config does not include a + Draw feature flag. + - Open question Q-DRAW: read-only display of baked features is a future + enhancement; not in scope for the first static build. +- The tool's entry in the codegen output (`src/pre/tools.js`) can be filtered + out at bake time by the publish script if Q-DRAW resolves to "drop entirely." + +### E.2 Real-time collaboration + +- **Files:** `mmgisAPI` event bus subscribers across multiple tools that + subscribe to draw/sync/presence events. +- **In static:** the WebSocket connect call (in essence boot or `mmgisAPI` + bootstrap) must short-circuit on `MODE === 'static'`. Subscribers see no + events, which is acceptable. + +### E.3 Measure tool — elevation profile + +- **File:** `src/essence/Tools/Measure/MeasureTool.js`. +- **Correction:** Measure does **not** construct a TiTiler URL directly. It calls `calls.api('getprofile', …)` which routes to the **backend** route `/api/utils/getprofile`. The backend route in turn may delegate to TiTiler (verify), but the frontend never builds `/titiler/…` URLs for this feature. +- **In static:** the backend route disappears. Three dispositions: + - **Hide the elevation profile affordance.** Default if the cost of the alternatives is too high. + - **Call TiTiler directly cross-origin** from the static frontend. Requires writing new client code that does what the backend's `getprofile` does (sample TiTiler line-string endpoint along the user's drawn line). + - **Replace with client-side computation over baked DEM tiles.** The Measure tool already does DEM-tile reads for its other features; in static the elevation profile could sample the same baked DEM tiles directly. +- Recommended default: **hide the elevation profile in static**, revisit if stakeholders ask. The client-side-from-DEM-tiles path is a real future option. +- This is **not** covered by the Phase B URL-helper rewrite — it's a backend-route-disappearance pattern. + +### E.4 Identifier tool + +- **File:** `src/essence/Tools/Identifier/IdentifierTool.js`. +- **Two distinct call shapes here:** + - **Direct sidecar URL construction** (`/titiler/cog/point/…`, `/titilerpgstac/collections/…`) — Phase B URL-helper rewrite handles these. Hit the shared admin-stack TiTiler / TiTiler-pgSTAC over CORS in static. + - **`calls.api('getbands', …)`** — backend route `/api/utils/getbands`, NOT a TiTiler URL. In static the backend route disappears. Dispositions: + - Hide the band-list affordance in static (default). + - Write new client code that calls TiTiler's `/cog/info` (or similar) directly to get band metadata. + - Recommended: hide for the first cut. + +### E.5 Shade tool + +- **File:** `src/essence/Tools/Shade/ShadeTool.js`. +- **In static:** if Shade depends on a server-rendered shadow texture, the + feature is hidden. If it is pure-client over DEM tiles, it survives via + the baked DEM tiles. **Verify against code** — this is open. Mark + `Q-SHADE` and resolve during implementation. + +### E.6 TimeControl + +- **Files:** `src/essence/Basics/TimeControl_/TimeControl.js` and `TimeUI.js`. (Verified path: TimeControl lives in `Basics/`, not `Tools/` — corrected from an earlier draft.) +- **In static:** v3 flagged `query_tileset_times` as a server call. Two + paths: + - **Bake the times into the config.** `scripts/publish-static.js` queries + the admin for the time list and inlines it into a `times` field of the + layer config. Frontend reads from config; no runtime fetch. + - **Hit the admin's endpoint at runtime.** Adds a runtime dependency on + the admin; rejected by default. + Default: bake. + +### E.7 Layers tool + +- **File:** `src/essence/Tools/Layers/LayersTool.js`. +- **In static:** + - Direct sidecar URL constructions (TiTiler info/bounds, vector tiles) — Phase B URL-helper rewrite. + - `calls.api('proj42wkt', …)` — backend route `/api/utils/proj42wkt`, disappears in static. Disposition: hide the affordance, or port Proj4js to do projection conversion in the browser (the v3 plan flagged this option). + - "Fetch layer info from server" affordances (if any) hide in static. + - Basic layer toggles and filters work off `L_` and continue. + +### E.8 Search + +- **Files:** `src/essence/Ancillary/Search.js` (the UI — verified path: `Ancillary/`, not `Tools/Search/`); server side `API/Backend/Datasets/routes/datasets.js` `search` handler. +- **In static:** server-side search is not available. Three sub-options + (Q-SEARCH): + - Hide the tool entirely. **Default.** + - Build a client-side index over baked data at bake time and ship it. + Adds work; revisit if customers demand it. + - Point at a shared search service. No such service exists today. + +### E.9 Login UI + +- **File:** `src/essence/Ancillary/Login/Login.js`. +- **In static:** hidden. The CloudFront Function password gate is the only + auth mechanism. + +### E.10 Configure entry point + +- **File:** `src/essence/LandingPage/LandingPage.js`. +- **In static:** the "Configure" button (admin-only affordance) hides. + +### E.11 Verification + +- Manual: load the dashboard. Confirm each feature in §E either works + (Identifier, Measure, time control) or is absent (Draw, Search, Login, + Configure entry). +- Automated: a new Playwright spec under `tests/e2e/static-mode/` that boots + the static bundle and walks the feature list. + +### E.12 Rollback + +Each E sub-step is an isolated branch on `MODE`. Reverting any one does not +break the others. + +--- + +## Phase F — Mission asset S3 migration + +**Goal:** Move `Missions/` from local disk to S3. Admin reads/writes via +middleware that fetches from S3; dashboards have relative paths rewritten +to absolute S3+CloudFront URLs at bake time. + +### F.1 Admin-side middleware change + +**Files:** `scripts/middleware.js` (the `missions()` function) and `scripts/server.js` (the 3-middleware stack that mounts `/Missions/...`). + +Today's mount is a **stack of three middlewares**, not a single `express.static`: + +```js +app.use( + `${ROOT_PATH}/Missions`, + ensureUser(), // 1. auth + middleware.missions(ROOT_PATH), // 2. _time_ compositing + express.static(path.join(rootDir, "/Missions")) // 3. static fallback +); +``` + +The S3 refactor addresses each layer separately: + +**Layer 1 — `ensureUser()`.** Keeps gating admin-side access to mission assets. No change. + +**Layer 2 — `middleware.missions(ROOT_PATH)` (the `_time_` compositing path).** Server-side `sharp` compositing of time-windowed tiles. No cheap static equivalent. Three options for the admin: +- Continue server-side compositing: admin streams constituent tiles from S3, composites with `sharp`, returns. Works but expensive (each request fetches N tiles). +- Bake all time slices at publish time and serve statically. Admin-side this would mean precomputing on every config save — heavy. +- Hide the feature in the admin. Loses functionality. + +Recommended default for the admin: **continue server-side compositing.** Dashboards pre-bake or hide per layer (Q-TIME). + +**Layer 3 — `express.static('./Missions')`.** This is the layer that moves to S3: +- **CloudFront-fronted S3, with the admin redirecting `/Missions/...` to the CloudFront URL.** Simplest. The browser fetches direct from CloudFront. Drawback: admin auth no longer gates mission assets. Acceptable because assets are already semi-public (they go into dashboards). If true privacy is needed, use signed URLs. +- **Admin proxies through to S3 (Express → S3 GetObject → stream back).** Preserves auth gating; pins bandwidth to the admin task; loses CloudFront caching for admin users. + +Recommended default for the static fallback: **CloudFront-fronted S3 with redirect.** + +**Refactor structure:** Worth splitting `middleware.missions(ROOT_PATH)` itself in the refactor — pull the `_time_` compositing into a separate `middleware.timeComposite(ROOT_PATH)` and leave path-translation as a thin layer the S3 backend can replace independently. + +### F.2 Dashboard layer URL rewriting + +**File:** `scripts/publish-static.js` (introduced in Phase D). + +The publish step makes a per-layer decision about where each layer's data +will live in the dashboard, then rewrites the layer's URL in `configData` +accordingly. Three destinations (per ADR §9.2): + +- **Leave in admin's S3 bucket** — for raster tiles, DEMs, and basemap + imagery already uploaded by admins. The URL is rewritten to the admin's + CloudFront-fronted S3 URL (e.g. + `https://mission-assets./Missions//Layers/...`). + No data copy needed. +- **Copy to the dashboard's own bucket** — for small per-mission data the + publish step decides to bake (small GeoJSON, lookup tables, baked search + indices). The script reads the source data from Postgres or admin S3, + serializes if needed, writes the static file into the dashboard's S3 + bucket alongside the JS bundle, and rewrites the URL to a relative path + (e.g. `/data/sites.geojson`) that resolves against the dashboard's own + CloudFront origin. +- **Point at a shared sidecar** — for data that needs dynamic querying + (TiTiler-served COG mosaics, tipg-served PostGIS layers, a custom search + endpoint). The URL is rewritten to an absolute sidecar URL (e.g. + `https://titiler./cog/tiles/{z}/{x}/{y}?url=s3://...`). + +The heuristic for which destination a layer ends up at lives in the publish +script. First-pass defaults: raster-tile and DEM layers default to "leave +in admin S3"; small vector/tabular layers default to "copy to dashboard +bucket"; layers backed by COG mosaics or PostGIS tables default to "point +at sidecar." The mission config may eventually grow a per-layer override. + +**Sub-decision: shared mission-asset bucket vs per-dashboard copy** (for +the "leave in admin's S3" case). Shared is cheaper and immediate, but the +dashboard is "live" against admin's S3 — if admins re-upload, the dashboard +sees the change. Per-dashboard copy preserves immutability at the cost of +duplicating large rasters per dashboard. Default: **shared bucket**, with +per-dashboard copy as a future option for missions that need +frozen-at-publish-time guarantees. + +### F.3 Single-file upload path migration + +**File:** Upload handlers in `API/Backend/...` that today write to disk under +`Missions//`. + +For **single-file uploads** (sample media, individual rasters, individual +files an admin pushes through the UI), the path is straightforward: + +- Browser → Express → disk: replaced by +- Browser → presigned S3 POST → S3, with Express only handing back the + presigned URL. + +Presigned URL generation: AWS SDK v3 (`@aws-sdk/client-s3` + +`@aws-sdk/s3-request-presigner`). Express handler signs a POST policy with +size and prefix constraints (e.g. only `Missions//Data/` prefix, +max size from a config var). + +For **tile-pyramid uploads** — the canonical big-file workflow — the +mechanism is open (ADR Q-BIG-UPLOAD). See F.4 for the implementation +sketches; whichever option the ADR resolves to becomes the execution plan. + +**Affected upload endpoints:** + +- Single-file mission-asset uploads (media, individual rasters): switch to + presigned. +- Tile-pyramid uploads: see F.4 (Q-BIG-UPLOAD). +- Dataset (CSV) uploads — these go to Postgres, not disk; **no change**. +- Geodataset uploads — same; **no change**. + +### F.4 Tile-pyramid upload workflow (Q-BIG-UPLOAD) + +**Status:** open — ADR §4.5 has not picked a workflow. This subsection +sketches the implementation path for each of the three options so +execution can start once the choice is made. Once stakeholders pick, the +corresponding subsection below becomes the execution plan and the others +can be deleted. + +The problem: today's tile-pyramid workflow (`gdal2customtiles.py` produces +a folder of thousands of tiles; operator `scp`s the folder into `Missions/`) +doesn't survive AWS, and admin users don't have direct AWS credentials, so +the upload has to go through the admin UI. + +**Option A — Upload as a single archive, extract server-side.** + +Operator zips or tars the pyramid on their workstation before upload. The +browser uploads one archive file via presigned to a *staging prefix* in S3 +(e.g. `s3://mmgis-staging/.zip`). A spawned ECS task then: + +- Downloads the archive from staging. +- Extracts it. +- Writes the individual tile files into the canonical mission prefix in + admin's S3 (`s3://mmgis-missions//Layers////.png`). +- Deletes the staging archive. +- Notifies the admin UI when complete. + +**Files:** new `API/Backend/Uploads/routes/uploads.js` for the +orchestration endpoint; new spawned task definition (similar to the publish +task in Phase H) for the extract job. The existing admin upload UI gets a +"zip your pyramid first" instruction and the new endpoint flow. + +**Trade:** operator UX is one upload action; reintroduces a backend step +in the upload path; the extract job needs its own memory/disk allocation +for big archives. + +**Option B — Bulk multi-file presigned upload.** + +The browser fires off many parallel presigned uploads — one per tile. +Workflow: + +- Operator selects the pyramid folder in the file picker (HTML5 + `webkitdirectory` attribute on ``). +- Browser enumerates files, requests a presigned URL per file from a new + batch endpoint (`POST /api/uploads/presign-batch`). +- Browser PUTs each file to its presigned URL with bounded parallelism + (e.g. 8 concurrent). +- Browser tracks progress, retries individual failures, reports completion + to the admin server. + +**Files:** new `API/Backend/Uploads/routes/uploads.js` for batch presign +generation; substantial new frontend logic in the admin UI for upload +orchestration, progress, retry, and recovery from page reloads. + +**Trade:** no new backend processing; brittle at scale (browser memory +holds the file list, dropped connections lose individual uploads, no +cross-file resumability across page reloads); presign generation is one +admin round-trip per file. + +**Option C — Shift production format to COGs.** + +The operator workflow changes: `tifs2cogs` (already in `auxiliary/stac/`) +instead of `gdal2customtiles`. The output is a single COG file. The browser +uploads one file via presigned multipart (5GB single-PUT, 5TB multipart +ceiling). The TiTiler sidecar (already in our adjacent-services set) +serves tiles from the COG on demand over HTTP. + +**Files:** + +- `scripts/publish-static.js` — when rewriting URLs (F.2), recognize + COG-backed layers and emit TiTiler URLs + (`https://titiler./cog/tiles/{z}/{x}/{y}?url=s3://...`). +- Mission config schema — add a layer-type or field marking the layer as + COG-backed. +- Configure UI — surface the new layer type for admins setting up a COG + layer. +- Documentation — operator runbook updates for the `tifs2cogs` workflow. +- **Migration:** existing tile-pyramid layers in production mission + configs need per-layer re-baking. The publish script could fail-loud on + layers pointing at legacy tile-pyramid URLs to force the migration + rather than silently shipping broken dashboards. + +**Trade:** clean single-file upload aligned with AWS object storage; +requires updating both production data and operator workflow per-layer; +existing layers need migration scoping against the mission backlog. + +### F.5 Verification + +- Admin: create a new mission, upload a sample tile pyramid via the new + presigned flow, render the layer in the admin map. Assets should be + served by CloudFront URLs (verifiable in DevTools network tab). +- Dashboard: publish a dashboard. Open the dashboard URL. The map renders + layers from the rewritten absolute URLs. +- The `_time_` admin behavior (if a mission uses it) still composites + correctly. + +### F.6 Rollback + +Disk-backed storage and admin proxying mode can be re-enabled with a env +flag (`MISSIONS_STORAGE=disk` vs `MISSIONS_STORAGE=s3`). Keep both code +paths during transition for at least one production cycle. + +--- + +## Phase G — Adjacent services on ECS + +**Goal:** Each Python sidecar runs as its own ECS Fargate service, behind +the admin ALB. The Express proxy continues to forward (per ADR §4.1 default). + +### G.1 ECS task definitions + +For each of TiTiler, TiTiler-pgSTAC, STAC API, tipg, veloserver: + +- One Fargate service per image. Same image tags as today + (`ghcr.io/developmentseed/titiler:0.22.2`, `ghcr.io/stac-utils/stac-fastapi-pgstac:5.0.2`, + etc. — verified by the adjacent-services research). +- CPU/memory sized to current docker-compose hints; refine based on load + testing. +- Environment variables: as per the docker-compose entries — DB credentials, + GDAL config (`CPL_TMPDIR`, `GDAL_CACHEMAX`, `GDAL_DISABLE_READDIR_ON_OPEN`, + `VSI_CACHE`), `TILEMATRIXSET_DIRECTORY`, AWS credentials (optional for S3 + COG fetching). +- The `./adjacent-servers/resources/tilematrixsets/planetcantile_v4` + directory is **a host-filesystem dependency** in the docker-compose setup; + for ECS, either bake into a custom image or ship it via a mounted EFS. + Default: bake. Custom Dockerfile that COPYs the directory into the image. +- The Missions/ path is also mounted today; in AWS this becomes an S3 read + (the Python services pass S3 URLs to GDAL; GDAL supports `s3://...` paths + natively with proper config). + +### G.2 ALB target groups + +One target group per sidecar service. The admin's Express task continues to +have its own target group. ALB listener rules route by path: + +- `/api/*`, `/configure*`, `/`, `/docs*` → admin Express target. +- `/stac*`, `/titiler*`, `/titilerpgstac*`, `/tipg*`, `/veloserver*` → admin + Express target (which proxies internally — Phase G keeps the Express proxy + per ADR default). + +The optional "ALB direct routing" alternative would route those sidecar +paths to their own target groups; defer per ADR §4.1. + +### G.3 Service discovery + +The admin Express task needs to resolve sidecar service names. Today +`isDocker` in `adjacent-servers-proxy.js` swaps `localhost` for the +docker-compose service name. In ECS: + +- Use **ECS Service Discovery** (Cloud Map). Each sidecar service registers + a private DNS name like `titiler.mmgis.internal`. +- The Express proxy resolves sidecar URLs by DNS, not by `isDocker` env. +- Update `adjacent-servers-proxy.js` to read sidecar hostnames from env + vars (`TITILER_TARGET_URL`, `STAC_TARGET_URL`, etc.) — already a sensible + parameterization regardless of deployment. + +### G.4 CORS for cross-origin dashboard access + +For each sidecar, configure CORS to allow: + +- The admin's CloudFront origin (`https://admin.mmgis.example`). +- Every published dashboard's CloudFront origin (or a wildcard like + `*.dashboards.mmgis.example` if a subdomain scheme is used). + +Implementation per service: + +- **TiTiler / TiTiler-pgSTAC:** CORS via the FastAPI/Starlette middleware + built into the image. Settable via `TITILER_API_CORS_ALLOW_ORIGINS` env + var (verify exact var name when building the image). +- **STAC API (`stac-fastapi-pgstac`):** similarly Starlette-based; CORS env + vars exist. +- **tipg:** has its own CORS config. +- **veloserver:** unknown until Q-VELO resolves. + +### G.5 Database wiring + +- TiTiler-pgSTAC, STAC API, tipg all need a connection to the `mmgis-stac` + database. RDS endpoint, credentials via Secrets Manager. Same connection + string structure as today. +- TiTiler is filesystem-only (no DB) — same as today. +- Veloserver — unknown. + +### G.6 Veloserver — verify before provisioning + +Verified: **zero frontend code paths in `src/essence/` construct `/veloserver` URLs today** (grep returned nothing). The backend proxies it (`adjacent-servers-proxy.js`), but no current Essence code reaches for it. Before allocating ECS capacity for veloserver: + +1. Check whether any production mission config references a veloserver-backed layer (Q-VELO from the ADR). +2. If no, drop the service from the AWS deployment entirely. +3. If yes, document its DB / env / mount requirements (the docker-compose entry has no DB, env vars, or init config, so this is a real research task). + +This is cheap to defer until Phase G execution time. + +### G.7 Verification + +- Each sidecar reachable from the admin task via its Service Discovery name. +- Each sidecar reachable from a dashboard's CloudFront origin via its public + ALB path. CORS-allow-listed. +- Express proxy continues to gate admin-origin requests through `ensureAdmin`. +- Open question Q-AUTH-2: cross-origin dashboard requests bypass `ensureAdmin` + by design (they hit the sidecar via ALB path, not via the proxy). This is + the trade-off the ADR called out; revisit if security review requires + signed requests. + +### G.8 Rollback + +Each sidecar's ECS service can be scaled to zero and the corresponding +docker-compose entry re-enabled for local development. The ALB listener +rules can be removed. + +--- + +## Phase H — Provisioning code + +**Goal:** Implement the "Publish" button path: admin Express receives the +request, kicks off a build + provision + upload sequence, returns the URL. + +### H.1 New Express endpoint + +**File:** `API/Backend/Publish/routes/publish.js` (new module under +`API/Backend/Publish/`). Loaded by `API/setups.js` automatically. + +**Endpoints:** + +- `POST /api/publish` — body `{ mission, dashboardName, settings }`. + Authenticated, admin-only (`ensureAdmin(true, false, false)`). +- `DELETE /api/publish/:id` — tears down a dashboard. +- `GET /api/publish` — lists dashboards. +- `GET /api/publish/:id` — returns one dashboard's metadata. + +### H.2 The publish handler + +**Sequence (Phase H.2):** + +1. Validate request. Confirm `req.user` has permission for `mission` + (`checkMissionPermission` from `configs.js`). +2. Create a `dashboards` row (Phase I.1) with status `provisioning`. +3. Trigger the build + provision job: + - **Option A (in-process):** synchronous; admin task ties up CPU. Bad. + - **Option B (spawned ECS task — RECOMMENDED):** call the ECS RunTask API + with a task definition that runs `scripts/publish-static.js` plus the + provisioning steps below. + - **Option C (CodeBuild):** trigger a CodeBuild project. Adds CodeBuild + as a managed surface. +4. The spawned task does: + - Read the mission config from RDS (via the same Sequelize models the + admin uses). + - Run `scripts/publish-static.js` (Phase D). + - **Create S3 bucket** (`s3:CreateBucket`, `s3:PutBucketEncryption`, + `s3:PutBucketPublicAccessBlock`). + - **Upload the build artifacts** (`s3:PutObject`). + - **Create CloudFront distribution** (`cloudfront:CreateDistribution`). + Origin = the S3 bucket. Behaviors = SPA fallback to `/index.html` for + 404s, aggressive caching for `/static/*` (fingerprinted), no cache for + `/index.html`. + - **Create CloudFront Function** (`cloudfront:CreateFunction`, + `cloudfront:PublishFunction`). The Function checks an Authorization + header against an embedded password. The password value is generated + per dashboard (Q-AUTH-1). + - **Attach the Function** to the distribution's viewer-request event. + - **Create the DNS record** (`route53:ChangeResourceRecordSets`) under + the configured hosted zone. + - **Update the `dashboards` row** with the resulting URL and status + `published`. +5. The admin endpoint returns immediately with `{ dashboard_id, status: "provisioning" }`. +6. The Configure UI polls `GET /api/publish/:id` until `status === "published"`, + then shows the URL. + +### H.3 IAM policy + +The spawned task's IAM role needs exactly: + +- `s3:CreateBucket`, `s3:PutBucketEncryption`, `s3:PutBucketPolicy`, + `s3:PutBucketPublicAccessBlock`, `s3:DeleteBucket`, `s3:PutObject`, + `s3:DeleteObject`, `s3:ListBucket` — scoped to a bucket-name prefix + (`mmgis-dashboard-*`). +- `cloudfront:CreateDistribution`, `cloudfront:UpdateDistribution`, + `cloudfront:DeleteDistribution`, `cloudfront:GetDistribution`, + `cloudfront:CreateInvalidation`, `cloudfront:CreateFunction`, + `cloudfront:UpdateFunction`, `cloudfront:DeleteFunction`, + `cloudfront:PublishFunction`, `cloudfront:GetFunction`. +- `route53:ChangeResourceRecordSets`, `route53:GetHostedZone` — scoped to + the configured hosted zone. +- `rds-db:connect` — scoped to the dashboards database user, for reading + the mission config. +- `secretsmanager:GetSecretValue` — for any per-dashboard secrets. + +### H.4 Teardown + +`DELETE /api/publish/:id` runs the reverse: + +1. Mark `dashboards` row `status: deleting`. +2. Spawned task: + - Invalidate CloudFront (optional; deletion implies it). + - **Disable** the distribution (a delete only works after disable). Wait + for the disable to propagate. + - Delete the distribution. + - Delete the Function. + - Empty and delete the bucket. + - Delete the DNS record. + - Delete the `dashboards` row. + +The distribution-disable wait is a real wrinkle — disabling a distribution +takes 15–30 minutes. The teardown task must handle the asynchronous +completion (poll the distribution status until `Enabled === false`, then +delete). + +### H.5 Verification + +- Functional: publish a dashboard. Wait. URL appears. Visit URL. Map loads. +- Functional: delete the dashboard. After ~30 min, the bucket and + distribution are gone. +- IAM least-privilege: confirm the spawned task cannot create resources + outside the documented scope. + +### H.6 Rollback + +The provisioning code can be removed or its endpoint disabled with a config +flag. Existing dashboards are not affected. + +--- + +## Phase I — Dashboard registry + +**Goal:** Persist the set of published dashboards. + +### I.1 The `dashboards` table + +**Migration:** MMGIS doesn't have a separate `API/Backend/Databases/` migrations directory — feature directories under `API/Backend/` each own their own models alongside their routes, and Sequelize `.sync()` (run during the main server's boot via `setups.synced(s)`) creates tables from those models. So this is a new feature module following the existing pattern (Accounts, Datasets, Draw, etc.). + +**Schema (Sequelize model):** + +``` +id INTEGER PRIMARY KEY AUTOINCREMENT +name STRING NOT NULL UNIQUE -- subdomain-safe +mission STRING NOT NULL -- the source mission +created_by INTEGER REFERENCES users(id) +status STRING -- provisioning|published|deleting|failed|deleted +url STRING -- final dashboard URL once published +cloudfront_id STRING -- for invalidate/delete +bucket_name STRING -- for delete +function_arn STRING -- for delete +password_hash STRING -- bcrypt of the gate password +settings JSONB -- arbitrary publish-time settings +created_at TIMESTAMP +updated_at TIMESTAMP +deleted_at TIMESTAMP -- soft delete +``` + +**File:** `API/Backend/Dashboards/models/dashboard.js`. + +### I.2 Configure UI surface + +In `configure/src/` add a new "Dashboards" section showing: + +- The list (from `GET /api/publish`). +- A "Publish" button that opens a dialog and POSTs. +- A "Delete" button per row. +- A "Copy URL" affordance. +- Real-time status (polling) during provisioning. + +**Effort note:** Verified that the existing Configure pages (`configure/src/pages/`: APIs, APITokens, Datasets, GeneralOptions, GeoDatasets, STAC, Users, WebHooks) are all CRUD-over-forms. **There is no existing "async backend job, poll for status, surface result URL" pattern in Configure.** The Redux Toolkit and `core/calls.js` plumbing carries over for state and API calls; the async-job state machine is net-new UX. Plan accordingly. + +### I.3 Verification + +- The registry stays consistent across publish/delete cycles. +- A published dashboard's URL works when its row status is `published` and + fails open when status is `deleting` (404 from CloudFront after distribution + delete). + +--- + +## Phase J — Deploy-time gaps + +### J.1 First-user-becomes-superadmin gap + +**File:** `API/Backend/Users/routes/users.js`, `first_signup` handler. + +The handler creates a permission-`111` user when `User.count() === 0`. In an +AWS deployment, the admin is exposed publicly during the gap between deploy +and first login. Options: + +- **Operational runbook:** restrict ALB ingress to a known IP during initial + provisioning. The admin operator logs in, then ingress opens up. Cheapest; + human discipline required. +- **Seed a superadmin during init-db:** the ECS init-db one-shot task creates + a superadmin from credentials in Secrets Manager. The `first_signup` + handler no longer fires for the first request. Removes the race entirely; + adds the operational task of putting a credential in Secrets Manager. +- **Disable `first_signup` behind a config flag** in AWS deployments. + Combined with the Secrets Manager seed, this is the safe form. + +Recommended default: **seed via init-db + disable `first_signup` in AWS**. + +### J.2 CloudFront Function password gate + +The Function source (JavaScript, runs at viewer-request): + +```js +function handler(event) { + var req = event.request; + var auth = req.headers.authorization; + var expected = "Basic " + EXPECTED_BASE64; // baked at function publish + if (!auth || auth.value !== expected) { + return { + statusCode: 401, + statusDescription: "Unauthorized", + headers: { + "www-authenticate": { value: 'Basic realm="dashboard"' } + } + }; + } + return req; +} +``` + +The Function is generated at publish time by Phase H.2, with `EXPECTED_BASE64` +substituted from the per-dashboard password. + +**Limitations:** + +- Basic auth re-presents on every browser session. Acceptable for the use + case. +- The password is visible in the Function's published source to anyone with + IAM access to CloudFront — not a leak path of concern. +- Per-dashboard password rotation requires republishing the Function. Not a + hot path; acceptable. + +### J.3 Rollback + +J.1 changes are reversible by re-enabling `first_signup`. J.2 changes only +affect dashboards; admin is unaffected. + +--- + +## Cross-cutting implementation notes + +### CSP and helmet config + +**File:** `scripts/server.js` (helmet configuration). + +**Correction:** today's CSP is **already permissive**. Verified in `scripts/server.js`: `connectSrc: ["*"]`, `imgSrc: ["*", ...]`, `styleSrc: ["*", ...]`, `fontSrc: ["*", ...]`, `mediaSrc: ["*", ...]`. The browser is already permitted to fetch from any origin. Earlier draft framing of "today's CSP assumes single-origin" was wrong. + +What's actually env-controlled is `frame-ancestors` (`FRAME_ANCESTORS`) and `frame-src` (`FRAME_SRC`) — both for iframe embedding, not for cross-origin fetches. + +So the frontend CSP needs no changes for cross-origin dashboard → sidecar fetches. The cross-origin concern is **CORS configuration on each sidecar** (Q-AUTH-2), not the frontend CSP. + +For dashboards: the static bundle's `index.html` (or CloudFront response headers) should set: + +- `connectSrc` permissive enough to reach the shared admin-stack service URLs and the mission-asset CloudFront origin. The admin's value of `["*"]` is one option; tightening to specific origins is the more secure default. +- `frame-ancestors` matching the embedder allow-list expected for the dashboard. + +### Logging + +- Admin: CloudWatch Logs (default for ECS). Winston JSON output flows directly. +- Sidecars: CloudWatch Logs. +- Dashboards: CloudFront standard logs to a logs S3 bucket. No application + logs — dashboards do not have a backend. + +### Secrets + +- RDS password, session secret, sidecar tokens, dashboard gate passwords, + any AWS API keys — Secrets Manager. Rotation can be enabled per-secret. +- Sample env vars like `DB_PASS` continue to be read at runtime by the admin + task — but their values come from Secrets Manager bindings on the task + definition, not from a checked-in `.env`. + +### Local development + +- Docker-compose remains the local-dev environment, unchanged. Phase G's + ECS task definitions are not used locally. +- `npm run build:static` works locally for testing the static pipeline. + +### Tests + +- **Playwright covers both unit and e2e** — verified `package.json` has no `jest` configuration; the Playwright suite is the single test runner. The recent commit `chore: extend Playwright to parse TS unit test files` (in branch history) made this explicit. Earlier drafts referencing Jest were stale. +- Existing Playwright suite continues to run against the admin. +- New static-mode Playwright spec: `tests/e2e/static-mode/` boots the + static bundle and exercises the surviving features per §E. +- Add unit specs (Playwright TS unit format) for `serviceUrls.js` (Phase A.2), the `calls.api` stub branch (Phase C.4), and the publish handler (Phase H). + +--- + +## Open implementation questions (deferred from the ADR) + +These are too detailed for the ADR but block execution. Resolve before +starting the corresponding phase. + +- **Q-IMPL-1 (Phase A):** Are there existing build-time path constants + besides those listed in `configuration/paths.js`? Need to verify before + introducing `STATIC_MISSION_CONFIG` to avoid name collision. +- **Q-IMPL-2 (Phase B):** Exact call-site count for each adjacent service. + Run a grep before estimating refactor effort. +- **Q-IMPL-3 (Phase C):** Does `LandingPage.js` perform any synchronous computation in its pre-picker init that needs to happen in the short-circuit path? Audit `LandingPage.init` for side effects that fire before the picker renders, since the short-circuit must preserve them. (Scope narrowed by Q-LANDING resolution: no longer need to worry about missions-list computation — there's exactly one mission.) +- **Q-IMPL-4 (Phase C):** The Layers_.js boot-time STAC fetch — does the + `getSTACLayers` recursion logic need to match exactly in + `scripts/publish-static.js`, or can it diverge? Audit the function in + `Layers_.js` for behavior the frontend depends on. +- **Q-IMPL-5 (Phase D):** Webpack's `HtmlWebpackPlugin` interaction with + `%REACT_APP_*%` placeholders and the v3-flagged unquoted `%HOSTS%` — + verify behavior under STATIC_MODE before assuming the hardcode workaround + is sufficient. +- **Q-IMPL-6 (Phase E.5):** Shade tool's data dependencies — pure-client + over DEM tiles, or server-rendered shadows? Audit `ShadeTool.js`. +- **Q-IMPL-7 (Phase F.1):** Whether to keep `_time_` server-side compositing + in the admin. Performance / cost evaluation. +- **Q-IMPL-8 (Phase F.2):** Shared mission-asset bucket vs per-dashboard + copy — pin a default. ADR says shared; verify acceptable for stakeholders. +- **Q-IMPL-9 (Phase G.1):** Custom Dockerfiles for TiTiler / TiTiler-pgSTAC + to bake in `tilematrixsets/planetcantile_v4` vs. mounted EFS. Default: + bake. +- **Q-IMPL-10 (Phase G.4):** Exact CORS env-var name for each sidecar image + version. Verify by inspecting the upstream images. +- **Q-IMPL-11 (Phase H.2):** ECS-spawned-task vs CodeBuild for the publish + job. Default: ECS RunTask; revisit if CodeBuild ergonomics win out. +- **Q-IMPL-12 (Phase J.1):** Seed-superadmin mechanism — Secrets Manager + binding, hardcoded credential in IaC, or interactive setup step. Default: + Secrets Manager. +- **Q-IMPL-13 (Phase A / C):** `STATIC_MODE` env var (clean build-time flag) vs. reusing `mmgisglobal.SERVER = "static"` (activates existing dormant non-node branches in `calls.js` / `essence.js` / `LandingPage.js`). Recommendation: use both — `STATIC_MODE` for build-time DefinePlugin / DCE, `SERVER = "static"` for runtime activation. +- **Q-IMPL-14 (Phase C.4):** Each named call in `src/pre/calls.js`'s `c[]` table needs a per-call static disposition (bake / reroute / drop). The exhaustive disposition table is part of Phase C execution and should be documented in `STATIC_HANDLERS` itself. +- **Q-IMPL-15 (Phase E.3 / E.4):** Confirm whether the backend `/api/utils/getprofile` and `/api/utils/getbands` routes internally delegate to TiTiler (and therefore a direct-to-TiTiler client port is feasible) or do something more complex. +- **Q-IMPL-16 (Phase F.1):** Splitting `middleware.missions(ROOT_PATH)` into a path-translation middleware and a separate `_time_` compositing middleware before the S3 refactor — refactor-style decision, doesn't affect functionality. + +--- + +## Cross-reference + +- `adr.md` — the ADR this plan implements. Authority on every "what" and "why." +- `working-plan.md` — workflow doc for this branch. +- `features.md` — per-feature inventory and open questions. +- `decisions.md`, `aws-mapping.md`, `overview.md` — prior decision artifacts; + this plan absorbs their content into Phase F and Phase G mostly. +- `z-do-not-read/static-mode-plan-v3.md` — prior detailed plan. Not updated on + this branch. Where it disagrees with this plan, this plan is newer. diff --git a/docs/adr/deployment/preserve/overview.md b/docs/adr/deployment/preserve/overview.md new file mode 100644 index 000000000..5fb782d7e --- /dev/null +++ b/docs/adr/deployment/preserve/overview.md @@ -0,0 +1,88 @@ +# Overview: AWS deployment with admin/dashboard split + +**Status:** Draft +**Last updated:** 2026-05-20 + +## What we're doing + +Today MMGIS runs as one Docker-compose stack: a single Node process serves the admin tool, the main map app, and proxies the optional Python sidecars; one Postgres holds users, sessions, mission configs, datasets, geodatasets, and drawings. + +We're splitting that into two deployables on AWS: + +- An **admin stack** — close to today's app. Multi-user, authenticated, full-feature. +- Many **dashboards** — frozen, read-only frontend builds that an admin publishes for individual audiences. Dashboards share access to the admin stack's sidecars and asset storage; they have no backend of their own. + +## Guidelines + +This list is best-effort pending stakeholder feedback. If any item is challenged, the downstream ADRs need re-discussion. + +1. **One admin instance, many dashboard deployments.** +2. **Dashboards are S3 + CloudFront.** No per-dashboard compute. +3. **Preserve MMGIS features by default.** A feature drops only when it genuinely cannot work in its target deployable, and the drop is called out with a reason. +4. **Shared infrastructure beats per-dashboard infrastructure** unless isolation is a hard requirement. +5. **Sidecars deploy as part of the admin stack** and are reachable by dashboards over the network. +6. **Admin auth mirrors today's MMGIS** — multi-user accounts, Postgres-backed sessions, existing permission codes, optional CSSO. **Dashboard auth is one shared password** at the edge, with per-dashboard passwords as a nice-to-have. +7. **Deploys into an existing VPC** in the AWS account. No net-new VPC. +8. **CI/CD uses GitHub Actions.** + + +## General plan + +The shape of the solution at the service-category altitude. Specific configurations — which container service, which database engine, per-dashboard vs shared distribution — are decided in ADR-A. + +**Admin stack.** Today's MMGIS app deployed to AWS managed compute. One managed Postgres (with PostGIS) holds the data it holds today: accounts, sessions, mission configs, datasets, geodatasets, drawings, STAC catalog. S3 replaces the local `Missions/` folder for raster mission assets. A load balancer terminates TLS and routes today's same-origin paths (`/api`, `/configure`, `/stac`, `/titiler`, etc.). The four Python sidecars run as sibling services in the same cluster. + +**Dashboards.** Each published dashboard is a static frontend bundle in S3, fronted by CloudFront, with edge-evaluated password auth. One mission per dashboard, frozen at publish time. No mission picker, no `?mission=` switching, no backend, no database, no per-dashboard sidecar. + +**Shared sidecars.** TiTiler, STAC, tipg, and (conditionally) veloserver live in the admin stack and are reached by both admin and dashboards. The admin reaches them through today's same-origin proxy. Dashboards reach them by absolute URL, cross-origin. + +**Publish flow.** The admin owns a Publish action. It reads the mission's config from Postgres, builds a frontend bundle with the config frozen in, provisions the dashboard's AWS resources (bucket, distribution, password gate, DNS record), uploads, and returns the dashboard's URL. A matching Delete reverses each step. + +**Frontend refactor.** A small set of seams in the frontend codebase makes dashboard mode possible: a build-time config bake (the mission config is generated as a JavaScript module instead of fetched at boot), an API-call dispatcher with a no-server branch (bake / reroute / compute / drop, looked up per call), a URL helper for the sidecars (same-origin paths in admin mode, absolute URLs in dashboard mode), per-feature decisions for backend-only computations (drop, redirect, or move to the browser), and disabling features that have nowhere to go (login form, WebSocket consumers). + +## The ADRs + +This work is split across two ADRs: + +| ADR | Scope | Status | +|---|---|---| +| **ADR-A: AWS deployment** | Admin stack, dashboard infrastructure, URL topology, publish flow, shared-services posture, data layout. | Under Review | +| **ADR-B: Frontend refactor for dashboard mode** | The seams in the frontend, the dispatcher table, per-feature disposition, open decisions inside the refactor. | Under Review | + +Supporting documents: + +- **`features.md`** — per-feature disposition matrix (admin vs dashboard, with AWS implementation notes). +- **`detailed-implementation-plan.md`** — phase-by-phase implementation breakdown. + +## How the ADRs interact + +The two ADRs are coupled at specific points. The load-bearing dependencies: + +- ADR-A's URL topology choice (per-service subdomain vs single fronted CloudFront) determines whether ADR-B's URL helper builds subdomain-shaped URLs or path-shaped ones. +- ADR-A's cross-origin sidecar auth gate (CORS only, signed requests, etc.) determines whether ADR-B's dashboard frontend needs to attach auth credentials. +- ADR-A's per-dashboard-vs-shared CloudFront choice determines whether per-dashboard isolation is something ADR-B can rely on. +- ADR-B's dispatcher behavior in dashboard mode (bake / reroute / compute / drop) is what ADR-A's publish flow has to populate at build time. + +Decisions inside one ADR should not contradict the other. + +## Cross-cutting open questions + +These don't yet have an ADR home. When decided, each lands in an existing ADR or earns its own. + +- **Shared managed Postgres vs separate instances** for the main MMGIS DB and the STAC DB. Engine choice is a follow-on decision once sharing is settled. +- **Secrets storage** — Secrets Manager vs SSM Parameter Store. +- **Observability** — CloudWatch for admin; what for dashboards (CloudFront standard logs to S3, or something richer)? + +ADR-internal questions live in each ADR's open-questions section. Per-feature questions live in `features.md`. + +## Vocabulary + +Terms used across these documents. Wherever you see one of these, it means the same thing. + +- **Admin stack** — today's MMGIS app, deployed to AWS. +- **Dashboard** — a static, read-only frontend bundle with one mission baked in, served from S3 + CloudFront. +- **Sidecar** — one of the four Python services: TiTiler, STAC, tipg, veloserver. The codebase folder is named `adjacent-servers/` and proxy code uses "adjacent" naming; in prose, we say "sidecar." +- **Bake** — freeze data into a static file at publish time so the dashboard can fetch it without a backend. +- **Dispatcher** — the frontend pattern that picks an API call's destination (bake / reroute / compute / drop) based on build mode. +- **Reroute** — a dispatcher disposition: instead of calling MMGIS's backend, call a sidecar directly. +- **Publish flow** — the admin-side pipeline that turns a mission config into a deployed dashboard.