Skip to content

OpenSearch provider hygiene: parser guard, forward-attach sample, split-index ADR#118

Merged
bfarmer67 merged 3 commits intomainfrom
devs/bfarmer/opensearch-hygiene
May 4, 2026
Merged

OpenSearch provider hygiene: parser guard, forward-attach sample, split-index ADR#118
bfarmer67 merged 3 commits intomainfrom
devs/bfarmer/opensearch-hygiene

Conversation

@bfarmer67
Copy link
Copy Markdown
Contributor

Summary

Three small, defensive improvements to the OpenSearch provider, informed by a comparative analysis against a sibling external implementation. None changes existing API or behavior for valid migrations.

  • Parser drive-letter guard. Body-path validation already rejected leading //\ and .. per ADR-0017, but the lexer excluded : — so @C:/foo produced a confusing parse error rather than the intended "absolute path" message. Now: : is in the lexer accept set, drive-letter prefix (C:, c:, ...) and any other : are rejected with a clear remediation message. Closes a Windows-vs-Linux authoring asymmetry.
  • Theory-style rejection-sweep tests covering C:/foo, C:\foo, c:/foo, Z:\foo, a:/x.json, plus a test for stray : in a path.
  • New sample 9000 — ForwardAttachmentLifecycle demonstrating the declarative-attachment pattern for greenfield pipelines: index template with template.aliases block + ISM policy with ism_template.index_patterns block. No runtime APPLY POLICY, no runtime ALIAS ADD — the cluster wires both lazily as new indices roll over. Sample 4000 stays as the runtime-apply backfill demonstration; samples README pairs them.
  • Provider README grows a "Forward attachment vs runtime apply" subsection in the ISM section, making the choice explicit for adopters.
  • ADR-0018 captures why the OpenSearch provider ships two indices (.migrations ledger + .migrations-lock lock) while Aerospike/Couchbase/MongoDB/Postgres co-locate — PA-2 lock replicas:0 invariant requires distinct durability profiles.
  • ADR-0017 amended to describe the drive-letter check in the path-validation surface.

What this PR does NOT do

Two ideas from the source analysis were considered and deliberately excluded:

  • Strict APPLY POLICY ON INDICES (a, b, c) grammar variant. The justification (re-run determinism) doesn't apply because hyperbee's runner skips already-journaled migrations; recovery paths (Journal=false, Down+Up, operator deleting the journal entry) deliberately want current-state wildcard matching. Adding a literal-list form would solve a problem we don't have and create a real one (authors must enumerate auto-discoverable indices). If review-time clarity ever matters, the right shape is to capture the matched indices in dispatch logs / the journal record.
  • ISM CREATE POLICY 409 → CAS retry. The doc's main motivation (re-run after partial failure) doesn't apply for the same reason. The narrow remaining case (an [Migration(N, Journal=false)] migration that creates a policy) is reasonable to address as an authoring constraint rather than a verb-level idempotency hack.

Test plan

  • dotnet build — provider, samples, runner all clean.
  • dotnet test unit suite — 356/356 pass on net10.0; new drive-letter rejection rows pass.
  • dotnet format --verify-no-changes — clean.
  • CI green on this PR.

bfarmer67 added 2 commits May 4, 2026 11:33
…it-index ADR

Three small improvements informed by a comparative analysis against an
external OpenSearch provider implementation:

#1 Parser drive-letter guard. The body-path validator already rejected
leading `/` and `\` and `..` segments per ADR-0017, but the lexer
excluded `:` so `@C:/foo` produced a confusing parse error rather than
the intended "absolute path" message. Allow `:` in the lexer accept
set, then explicitly reject drive-letter prefixes (`C:`, `c:`, ...) and
any other `:` in the path. Closes a cross-platform asymmetry where an
author on Windows could write a path that's silently rooted on Linux.

#5 Rejection-sweep tests. Theory-style coverage for the drive-letter
shape (`C:/foo`, `C:\foo`, `c:/foo`, `Z:\foo`, `a:/x.json`) plus a
test for stray `:` in a path. Existing absolute-path and `..` tests
unchanged.

#4 Forward-attachment sample (9000-ForwardAttachmentLifecycle). New
sample demonstrating the declarative attachment pattern for greenfield
pipelines: index template with `template.aliases` block + ISM policy
with `ism_template.index_patterns` block. No runtime APPLY POLICY,
no runtime ALIAS ADD; the cluster handles attachment lazily as new
indices roll over. Provider README's ISM section grows a "Forward
attachment vs runtime apply" subsection making the choice explicit.
Sample 4000 stays as the runtime-apply backfill demonstration; samples
README pairs them with a one-paragraph explanation of when to use which.

ADR-0018 split-index trade-off. Captures why the OpenSearch provider
ships two indices (.migrations ledger + .migrations-lock lock) while
Aerospike/Couchbase/MongoDB/Postgres co-locate. Reason: PA-2 lock
`replicas:0` mitigation against replica-write coupling under N-runner
contention; the ledger keeps cluster-default durability. ADR-0017
also updated to mention the drive-letter check in the parse-time
validation surface.
ISM attachment to an index series is three different problems, not one:

  - Greenfield (future indices auto-attach via ism_template) -> sample 9000
  - One-time backfill (existing indices need a policy)       -> sample 4000
  - Ongoing reconciliation (policy evolves over time)        -> sample 9001 (NEW)

Sample 9001 demonstrates the reconciliation pattern: a
[Migration(N, journal: false)] that re-runs APPLY POLICY against the
wildcard pattern on every startup. ISM's change_policy is idempotent
for already-on-policy indices, so re-running is cheap and convergent.
The wildcard form is correct because the set of indices to reconcile
changes as new ones roll over and old ones are deleted.

Provider README's 'Forward attachment vs runtime apply' subsection
expanded into a 'Three temporal scopes for ISM attachment' table so
the choice between the three patterns is explicit, not implicit.
Samples README adds the same matrix and points at the provider README
as the canonical explainer.

The three are stackable in a mature pipeline (greenfield at install,
backfill when an existing series first adopts the policy, reconciliation
as the policy evolves). Many pipelines never need more than one --
but the idea is to choose deliberately rather than reach for runtime
APPLY POLICY by default.
…ee-scope ISM framing

Two doc gaps surfaced during a documentation review:

1. Path-validation list (provider README + docs/site/opensearch.md) didn't
   mention the drive-letter rejection added in this PR's parser change.
   A Windows author writing @C:/foo would get a clear runtime error but
   no docs telling them what shapes the validator catches.

2. docs/site/opensearch.md ISM section described only runtime APPLY POLICY.
   The provider README and samples README in this PR introduce the
   'Three temporal scopes for ISM attachment' framing (greenfield via
   ism_template, one-time backfill, ongoing reconciliation) and reference
   samples 4000/9000/9001 -- the site doc was the last surface still
   describing only the backfill case.

Site doc updates kept ASCII-only per site-build constraint.
@bfarmer67 bfarmer67 merged commit 4504c7c into main May 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant