diff --git a/docs/decisions/0017-body-source-grammar.md b/docs/decisions/0017-body-source-grammar.md index 542180c..1dca60d 100644 --- a/docs/decisions/0017-body-source-grammar.md +++ b/docs/decisions/0017-body-source-grammar.md @@ -139,11 +139,19 @@ contract; migrating existing resources is optional. ### Path validation (parse-time) -The grammar accepts characters `[a-zA-Z0-9_\-./\\]` in `@path`. +The grammar accepts characters `[a-zA-Z0-9_\-./\\:]` in `@path`. The +`:` is in the lexer's accept set only so a drive-letter prefix surfaces +as a clean "absolute path" error rather than a generic parse failure. Validation rejects at parse time: - Absolute paths (leading `/` or `\`) — body files must be inside the migration's resource folder. +- Drive-letter prefix (`C:`, `c:`, ...) — same reason. `Path.IsPathRooted` + is platform-dependent so an author editing on one host could otherwise + produce a manifest that's silently rooted on another. +- Any other `:` in the path — embedded resource names don't use it; the + reject is mechanical because a `:` that isn't a drive-letter prefix + is almost certainly an authoring mistake. - `..` segments — no parent-directory traversal; each migration's body files stay self-contained. diff --git a/docs/decisions/0018-split-ledger-and-lock-indices.md b/docs/decisions/0018-split-ledger-and-lock-indices.md new file mode 100644 index 0000000..2040696 --- /dev/null +++ b/docs/decisions/0018-split-ledger-and-lock-indices.md @@ -0,0 +1,74 @@ +# ADR-0018: OpenSearch Provider Splits Ledger and Lock into Two Indices + +**Status:** Accepted +**Date:** 2026-05-04 + +## Context + +Hyperbee's NoSQL provider family — Aerospike, Couchbase, MongoDB, Postgres — co-locates the migration ledger and the run-lock in a single namespace, distinguished by document id. A reviewer comparing the OpenSearch provider against a sibling internal implementation flagged that the OpenSearch provider deviates from this convention: it ships with two indices, `.migrations` (ledger) and `.migrations-lock` (lock), defaulted by `OpenSearchMigrationOptions.LedgerIndex` and `LockIndex` respectively. + +The deviation is intentional but, until now, undocumented as an ADR. The reviewer's observation is correct in two ways: + +1. **The convention exists.** The other four providers co-locate; the OpenSearch provider is the outlier. +2. **The deviation is load-bearing.** It exists to serve a specific concurrency invariant that the other providers don't face in the same shape. + +The risk of leaving this implicit is that a future provider author copies the wrong convention — either co-locating in OpenSearch when they shouldn't, or splitting in a future provider when they shouldn't. This ADR captures the reasoning so the next implementer makes the choice deliberately. + +## The deviation + +OpenSearch's primary-shard write contract is shard-replica coupling: a primary write blocks until each in-sync replica acknowledges the write. Under N-runner concurrent lock acquire (R-24b), the lock primary shard is contended; replica-write coupling adds a second source of tail latency on top of the contention itself. + +The mitigation (PA-2 from assessment 0002, encoded in `LockIndexInitStep`) is to create the lock index with `number_of_replicas: 0`. The lock document then writes to a single primary with no replica fan-out — eliminating replica-write coupling as a tail-latency contributor under contention. + +The ledger index has the opposite needs: it's a forensic record (R-06) used after the fact to answer "what migrations ran, when, in what direction, against what state." Durability matters; tail latency under concurrent writes does not (the lock serializes writes to the ledger). The ledger gets the cluster's normal replica configuration. + +Two distinct durability/latency profiles, two indices. The other providers don't face this trade-off because: + +- **Aerospike** uses native CAS on a record key in a configured namespace; durability is a namespace-level setting and is not coupled to replica-write semantics on a per-record basis. +- **Couchbase** uses bucket-level durability; the lock is a single document with provider-level coordination. +- **MongoDB** uses a collection-level write concern; the lock is a single document with `findOneAndUpdate` semantics. +- **Postgres** uses `pg_advisory_lock`; the lock is not a row in the ledger table at all. + +In each case, the lock's durability story is decoupled from the ledger's, either by language (advisory lock vs row) or by configuration knob (namespace, bucket, collection write concern). OpenSearch couples them through index settings — which means decoupling the two requires two indices. + +## Decision + +The OpenSearch provider will continue to ship two indices: + +- `LedgerIndex` (default `.migrations`) — strict-mapped ledger per R-06, with the cluster's normal replica configuration. +- `LockIndex` (default `.migrations-lock`) — `number_of_replicas: 0` per PA-2 mitigation, asserted by `LockIndexInitStep`. + +We will not introduce an option to combine them into a single index. The combined-index shape would either lose the PA-2 mitigation (if the index were configured for ledger-grade durability) or compromise the ledger's durability (if the index were configured for `replicas: 0`). Neither trade-off is worth the cross-provider symmetry. + +If a future operator deployment is so IAM-restricted that index creation is gated to a single index, we will reconsider — but only as a documented constrained-mode opt-in, never as a default. ADR-0013's `AssumeIndicesExist` already covers the IAM-restricted case for both indices; no additional surface is needed today. + +## Consequences + +**Easier:** + +- The lock's tail-latency story is clean: under R-24b N-runner contention, the lock primary shard's write path has no replica-coupling component. +- The ledger's durability story is clean: it inherits the cluster's normal replica configuration without per-index special-casing. +- Operators in non-AWS environments who configure cluster-wide replica counts get exactly what they expect for both indices. + +**Harder:** + +- Operators must monitor / back up two indices. In practice this is one extra entry in any backup or alerting tool; the entries are co-located by name (`.migrations*` glob covers both). +- Cross-provider documentation has to surface the asymmetry. This ADR is the canonical reference; the provider README's "Quick start" continues to default both indices for the common case. +- The next provider author asking "should I co-locate or split?" must read this ADR. The default answer is co-locate (the house style); split only when the lock and ledger have distinct durability or latency requirements that the underlying engine couples through shared configuration. + +**Constrains:** + +- The lock and ledger indices are part of the public contract of `OpenSearchMigrationOptions`. Removing either as a top-level index requires a superseding ADR. +- The PA-2 invariant (`number_of_replicas: 0` on the lock index) is asserted at startup by `LockIndexInitStep`; weakening this assertion requires a superseding ADR. + +## Relation to other ADRs + +- **ADR-0013 (Always-Create Lock and Ledger Indices in InitializeAsync with Explicit Override)** — this ADR refines the model that one introduced. ADR-0013 names the two indices and the always-create behavior; this ADR captures *why there are two*. +- **ADR-0005 (Provider-Native Distributed Locking)** — preserved. The split is an OpenSearch-specific implementation choice for native locking; the cross-provider lock contract is unchanged. + +## Implementation + +- `OpenSearchMigrationOptions.LedgerIndex` (default `.migrations`) and `OpenSearchMigrationOptions.LockIndex` (default `.migrations-lock`). +- `LedgerIndexInitStep` creates the ledger with strict R-06 mapping; replica configuration follows cluster default. +- `LockIndexInitStep` creates the lock with `number_of_replicas: 0` and asserts the value when the index already exists. Mismatch fails at startup with a remediation message. +- `OpenSearchRecordStore` reads the ledger and the lock through these options. There is no path that writes lock state to the ledger index (or vice versa). diff --git a/docs/decisions/INDEX.md b/docs/decisions/INDEX.md index 8aa475c..298788f 100644 --- a/docs/decisions/INDEX.md +++ b/docs/decisions/INDEX.md @@ -19,3 +19,4 @@ | 0015 | [Parser is Offline-Pure; All I/O is Runtime Middleware](0015-parser-offline-pure-all-io-runtime.md) | Accepted | 2026-05-02 | Clarifying corollary of ADR-0011; resolves R-30 template lookup ambiguity by deferring all I/O (including template body resolution) to runtime middleware | | 0016 | [OpenSearch Provider Does Not Use File-Level Templating](0016-no-file-level-templating.md) | Accepted | 2026-05-02 | Strikes R-10; matches Aerospike/Couchbase/MongoDB/Postgres house style (typed options + runtime substitution); deletes Phase 0 Task 0.4 work; removes Hyperbee.Templating dependency | | 0017 | [Body-Source Grammar — Three Resolution Forms](0017-body-source-grammar.md) | Accepted | 2026-05-02 | `WITH BODY @path` direct file reference + `bodies.` structured section + ADR-0009 sibling-property fallback for back-compat; parse-time path validation rejects absolute paths and `..` traversal | +| 0018 | [OpenSearch Provider Splits Ledger and Lock into Two Indices](0018-split-ledger-and-lock-indices.md) | Accepted | 2026-05-04 | Captures why the OpenSearch provider deviates from the Aerospike/Couchbase/MongoDB/Postgres single-namespace convention: distinct durability/latency profiles (PA-2 lock `replicas:0`, normal ledger durability) require two indices | diff --git a/docs/site/opensearch.md b/docs/site/opensearch.md index 4bc2a8c..235ce04 100644 --- a/docs/site/opensearch.md +++ b/docs/site/opensearch.md @@ -175,6 +175,8 @@ JSON bodies attach to a statement via `WITH BODY `. The provider supports t The `@`-prefixed path loads an embedded resource relative to the migration's own resource folder. Use this for any body that would otherwise dominate the `statements.json` file -- large mappings, ISM policies, reusable templates. Subfolders are optional. Path validation is parse-time: - Absolute paths (leading `/` or `\`) are rejected -- body files must stay inside the migration's resource folder. +- Drive-letter prefixes (`C:`, `c:`, ...) are rejected -- same reason. `Path.IsPathRooted` is platform-dependent (`C:/foo` reads as rooted on Windows but not on Linux); the validator checks the rooted shape explicitly so an author editing on one host can't produce a path that's silently rooted on another. +- Any other `:` in the path is rejected -- embedded resource names don't use it. - `..` segments are rejected -- no parent-directory traversal. - Allowed characters: letters, digits, `_`, `-`, `.`, `/`, `\`. @@ -509,12 +511,26 @@ Uploads the policy to `_plugins/_ism/policies` (or `_opendistro/_ism/policies` o APPLY POLICY TO [NO WAIT("")] ``` -Attaches the policy to existing indices matching the pattern via `_plugins/_ism/add`. The dispatcher inspects the response body and surfaces logical failures explicitly: HTTP 200 with `updated_indices: 0` is mapped to `Failed`, not silent OK. For future-only attachment, declare `ism_template.index_patterns` in the policy body (handled at index-creation time by the cluster). +Attaches the policy to existing indices matching the pattern via `_plugins/_ism/add`. The dispatcher inspects the response body and surfaces logical failures explicitly: HTTP 200 with `updated_indices: 0` is mapped to `Failed`, not silent OK. ```json { "statement": "APPLY POLICY hot-warm-cold TO logs-*" } ``` +#### Three temporal scopes for ISM attachment + +ISM attachment to an index series isn't one problem with three solutions -- it's three different problems, each with its own right tool. Pick by *when* the indices that need the policy come into existence relative to the migration that owns the policy. + +| Scope | Right tool | Sample | Notes | +|---|---|---|---| +| **Greenfield** -- attach to indices that will be created in the future | `ism_template.index_patterns` in the policy body, `template.aliases` in the index template | 9000 -- `ForwardAttachmentLifecycle` | Cluster handles it lazily at index-creation time. No migration runtime cost. Won't help with indices that already exist when the migration runs. | +| **One-time backfill** -- attach a policy to a set of indices that already exist at migration run time | Runtime `APPLY POLICY TO ` in a normal `[Migration(N)]` | 4000 -- `IsmPolicyAndApply` | Single-shot, journaled. Wildcards adapt to current cluster state at run time. Zero-updated -> `Failed` escalation makes it loud when the pattern matches nothing. | +| **Ongoing reconciliation** -- keep all matching existing indices on the current policy as the policy evolves | Runtime `APPLY POLICY TO ` in a `[Migration(N, journal: false)]` | 9001 -- `OngoingPolicyReconciliation` | Re-runs on every startup. Idempotent on the wire (ISM's `change_policy` is a no-op for already-on-policy indices). The wildcard form is correct because the set of indices to reconcile changes as new ones roll over and old ones are deleted. | + +The three are stackable. A typical mature pipeline uses **greenfield** at install time, **one-time backfill** when an existing series first adopts the policy, and **ongoing reconciliation** as the policy definition evolves over the project's lifetime. Many pipelines never need more than one -- but you should choose deliberately rather than reach for runtime `APPLY POLICY` by default. + +Caveat: `ism_template` inside a policy body is the modern endpoint shape. Older AWS-managed clusters served by the legacy `_opendistro/_ism` endpoint may not honor it; if `IsmEndpointDetectStep` resolves to the legacy endpoint, the greenfield row falls back to runtime `APPLY POLICY` (sample 4000's pattern, run once at install time, plus sample 9001's reconciliation pattern for ongoing changes). Modern OpenSearch (2.x and the modern AWS endpoint) supports `ism_template` natively. + ### WAIT FOR (cluster health) ``` diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Hyperbee.Migrations.OpenSearch.Samples.csproj b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Hyperbee.Migrations.OpenSearch.Samples.csproj index f80bf4d..6dc0bc6 100644 --- a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Hyperbee.Migrations.OpenSearch.Samples.csproj +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Hyperbee.Migrations.OpenSearch.Samples.csproj @@ -15,6 +15,11 @@ + + + + + @@ -28,6 +33,11 @@ + + + + + diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Migrations/9000-ForwardAttachmentLifecycle.cs b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Migrations/9000-ForwardAttachmentLifecycle.cs new file mode 100644 index 0000000..fb8b0f2 --- /dev/null +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Migrations/9000-ForwardAttachmentLifecycle.cs @@ -0,0 +1,52 @@ +using Hyperbee.Migrations.Providers.OpenSearch.Resources; + +namespace Hyperbee.Migrations.OpenSearch.Samples.Migrations; + +// Sample 9: forward-attachment lifecycle for greenfield pipelines. +// +// Contrast with sample 4 (IsmPolicyAndApply), which demonstrates the +// runtime APPLY POLICY path — necessary when you need to attach a policy +// to indices that ALREADY exist (backfill). +// +// For pipelines starting clean — daily rollover indices for a new +// application, fresh log streams, anything where the migration runs +// before the indices exist — declarative attachment is preferable. The +// migration installs only the cluster-level scaffolding: +// +// CREATE COMPONENT — shared settings/mappings, declared once. +// CREATE TEMPLATE — `index_patterns` matches the rollover series; the +// template's `template.aliases` block wires the +// alias automatically when a matching index is +// created. +// CREATE POLICY — the policy body's `ism_template.index_patterns` +// block attaches the policy to any matching index +// at creation time. +// +// Note: there is NO runtime APPLY POLICY and NO runtime ALIAS ADD. The +// first index in the series — created later by the application, by daily +// rollover, or by a successor migration — picks up everything: settings, +// mappings, alias, lifecycle policy. +// +// When to use this pattern vs. sample 4: +// +// - greenfield series (no existing indices) -> sample 9 pattern +// - existing indices that need a new policy -> sample 4 pattern +// - new policy applies to BOTH existing and future -> both: sample 4 +// pattern PLUS an +// `ism_template` +// block in the policy +// +// Caveat: `ism_template` inside an ISM policy body is the modern endpoint +// (`_plugins/_ism/policies`). Older AWS-managed clusters served by the +// legacy `_opendistro/_ism` endpoint may not recognize it; the bootstrap +// `IsmEndpointDetectStep` resolves which endpoint is active, but the +// declarative `ism_template` shape itself is a property of the modern +// schema. If you target a legacy endpoint, fall back to sample 4's +// runtime APPLY for forward attachment. + +[Migration( 9000 )] +public class ForwardAttachmentLifecycle( OpenSearchResourceRunner runner ) : Migration +{ + public override Task UpAsync( CancellationToken cancellationToken = default ) + => runner.StatementsFromAsync( "statements.json", cancellationToken ); +} diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Migrations/9001-OngoingPolicyReconciliation.cs b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Migrations/9001-OngoingPolicyReconciliation.cs new file mode 100644 index 0000000..f10e6f8 --- /dev/null +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Migrations/9001-OngoingPolicyReconciliation.cs @@ -0,0 +1,45 @@ +using Hyperbee.Migrations.Providers.OpenSearch.Resources; + +namespace Hyperbee.Migrations.OpenSearch.Samples.Migrations; + +// Sample 9.1: ongoing policy reconciliation. +// +// The third of three temporal scopes for ISM attachment. Pair it with +// sample 4 (one-time backfill via runtime APPLY) and sample 9 +// (greenfield via `ism_template`). +// +// Why this exists. Sample 9 installs a policy whose body has an +// `ism_template.index_patterns` block — new indices in the +// `sample_app_events-*` series auto-attach at creation. But when the +// policy DEFINITION later evolves (a new state added, transition +// criteria adjusted, retention reduced from 90d to 30d), existing +// indices that are already attached keep running on their cached copy +// of the policy until something explicitly re-attaches them. +// +// This migration runs `APPLY POLICY` against the same wildcard pattern +// the policy's `ism_template` covers — and it is journaled = false so +// it re-runs on every startup. The ISM `change_policy` API is +// idempotent: indices already on the current policy are a no-op, so +// re-running is cheap. The wildcard form is correct because the set of +// indices to reconcile changes as new ones roll over and old ones are +// deleted by the policy's own delete state. +// +// When NOT to use this pattern. +// +// - Greenfield-only series with policies that never change: sample 9 +// alone is enough. Don't add reconciliation noise on every startup +// for a thing that's already convergent. +// - One-time backfill of indices that exist before the policy: +// sample 4 (a normal `[Migration(N)]`) is the right tool. Don't +// reach for journaled = false unless the migration genuinely needs +// to run more than once. +// - Authoring-time-only enumeration of "these specific indices get +// this policy": just put the literal set in a normal migration; the +// wildcard story is for cluster-state-driven sets. + +[Migration( 9001, journal: false )] +public class OngoingPolicyReconciliation( OpenSearchResourceRunner runner ) : Migration +{ + public override Task UpAsync( CancellationToken cancellationToken = default ) + => runner.StatementsFromAsync( "statements.json", cancellationToken ); +} diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/README.md b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/README.md index 291c888..a677545 100644 --- a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/README.md +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/README.md @@ -15,11 +15,30 @@ this assembly via `Migrations:FromPaths` and runs them in version order. | 6000 | **`MigrateIndexComposite`** | **Featured: `MIGRATE INDEX` composite — the canonical template-propagation pattern (R-30)** | Form 2 | | 7000 | `ReversibleAlias` | Opt-in `rollback` per statement; partial-rollback ledger semantics (R-19) | (no bodies — DDL-only rollback) | | 8000 | `UnsafeReindex` | `REINDEX UNSAFE("")` — opt-out of `op_type:create` | Form 2 | +| 9000 | `ForwardAttachmentLifecycle` | Greenfield: declarative attachment via `template.aliases` + `ism_template` — **no runtime `APPLY POLICY` or `ALIAS ADD`** | Form 1 — direct `WITH BODY @path` for each body | +| 9001 | `OngoingPolicyReconciliation` | `[Migration(N, journal: false)]` + `APPLY POLICY ON ` — re-runs every startup; keeps matching indices on the current policy as it evolves | (no bodies — APPLY-only) | **Sample 6 is the headline.** Adopters asking "how do I apply a template/mapping change to existing data?" should be pointed at `MigrateIndexComposite` first; the long-form sample 2 exists to show what the composite expands to. +**Samples 4, 9, and 9.1 are the three temporal scopes for ISM attachment.** +Pick the one that matches *when* the indices that need the policy come into +existence relative to the migration that owns the policy: + +| Scope | Sample | When | +|---|---|---| +| Greenfield (future indices auto-attach) | 9000 | Index series doesn't exist yet — daily rollover for a new pipeline, fresh log streams | +| One-time backfill (existing indices) | 4000 | Indices already exist and need the policy attached once | +| Ongoing reconciliation (future + existing, policy evolves) | 9001 | Policy definition evolves over time; re-attach every startup so already-attached indices pick up the new version | + +The three are stackable in a mature pipeline (greenfield at install, +backfill when an existing series first adopts a policy, reconciliation as +the policy evolves). Many pipelines never need more than one — but choose +deliberately rather than reach for runtime `APPLY POLICY` by default. +The provider README's "Three temporal scopes for ISM attachment" section +is the canonical explainer. + **Body-source forms.** ADR-0017 defines three resolution forms for `WITH BODY` references. The samples deliberately demonstrate all of them so authors can compare the trade-offs side by side: diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/component.json b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/component.json new file mode 100644 index 0000000..1433c24 --- /dev/null +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/component.json @@ -0,0 +1,18 @@ +{ + "template": { + "settings": { + "number_of_shards": 1, + "number_of_replicas": 1, + "refresh_interval": "30s" + }, + "mappings": { + "dynamic": "strict", + "properties": { + "@timestamp": { "type": "date" }, + "level": { "type": "keyword" }, + "msg": { "type": "text" }, + "service": { "type": "keyword" } + } + } + } +} diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/policy.json b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/policy.json new file mode 100644 index 0000000..d2233ad --- /dev/null +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/policy.json @@ -0,0 +1,26 @@ +{ + "policy": { + "description": "Forward-attaching lifecycle policy. The `ism_template.index_patterns` block tells the cluster to attach this policy to any new index whose name matches `sample_app_events-*` at creation time. No runtime APPLY POLICY is required for indices that don't exist yet.", + "default_state": "hot", + "states": [ + { + "name": "hot", + "actions": [], + "transitions": [ + { "state_name": "delete", "conditions": { "min_index_age": "30d" } } + ] + }, + { + "name": "delete", + "actions": [{ "delete": {} }], + "transitions": [] + } + ], + "ism_template": [ + { + "index_patterns": ["sample_app_events-*"], + "priority": 100 + } + ] + } +} diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/template.json b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/template.json new file mode 100644 index 0000000..09294ee --- /dev/null +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/bodies/template.json @@ -0,0 +1,10 @@ +{ + "index_patterns": ["sample_app_events-*"], + "priority": 100, + "composed_of": ["sample_app_events_mappings"], + "template": { + "aliases": { + "sample_app_events": {} + } + } +} diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/statements.json b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/statements.json new file mode 100644 index 0000000..5bf6681 --- /dev/null +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9000-ForwardAttachmentLifecycle/statements.json @@ -0,0 +1,16 @@ +{ + "statements": [ + { + "//": "Shared mappings + settings extracted as a component template. Indices that match the next statement's index_patterns will compose this in.", + "statement": "CREATE COMPONENT sample_app_events_mappings WITH BODY @bodies/component.json" + }, + { + "//": "Index template with `aliases:` block — the cluster wires the alias automatically when a matching index is created. No runtime ALIAS ADD needed.", + "statement": "CREATE TEMPLATE sample_app_events WITH BODY @bodies/template.json" + }, + { + "//": "ISM policy with `ism_template.index_patterns` — the cluster attaches this policy to any matching index at creation time. No runtime APPLY POLICY needed.", + "statement": "CREATE POLICY sample_app_events_lifecycle WITH BODY @bodies/policy.json" + } + ] +} diff --git a/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9001-OngoingPolicyReconciliation/statements.json b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9001-OngoingPolicyReconciliation/statements.json new file mode 100644 index 0000000..dc555ba --- /dev/null +++ b/runners/samples/Hyperbee.Migrations.OpenSearch.Samples/Resources/9001-OngoingPolicyReconciliation/statements.json @@ -0,0 +1,8 @@ +{ + "statements": [ + { + "//": "Re-apply the lifecycle policy to every matching index. The wildcard adapts to current cluster state — newer indices that the policy already auto-attached are a no-op via ISM's idempotent change_policy semantics. Cheap to re-run on every startup; this migration is `journal: false` so it's expected to re-run.", + "statement": "APPLY POLICY sample_app_events_lifecycle TO sample_app_events-*" + } + ] +} diff --git a/src/Hyperbee.Migrations.Providers.OpenSearch/Internal/Grammar/OpenSearchStatementParser.cs b/src/Hyperbee.Migrations.Providers.OpenSearch/Internal/Grammar/OpenSearchStatementParser.cs index 83408a8..69d7570 100644 --- a/src/Hyperbee.Migrations.Providers.OpenSearch/Internal/Grammar/OpenSearchStatementParser.cs +++ b/src/Hyperbee.Migrations.Providers.OpenSearch/Internal/Grammar/OpenSearchStatementParser.cs @@ -119,9 +119,14 @@ private static Parser BuildParser() // policies, reusable templates). // // Path validation is parse-time only: we reject leading `/` or `\` - // (absolute paths) and any `..` segment (parent-directory traversal) - // so each migration's body files stay self-contained — keeps repeatable - // dotnet publish boundaries honest. + // (Unix-rooted), drive-letter prefixes like `C:` or `c:` (Windows- + // rooted), and any `..` segment (parent-directory traversal) so each + // migration's body files stay self-contained — keeps repeatable + // dotnet publish boundaries honest. Path.IsPathRooted is platform- + // dependent ("C:/foo" reads as rooted on Windows but not on Linux), + // so the validator checks the rooted shapes explicitly: an author + // editing on one host can't produce a path that's silently rooted + // on another. var dollar = Terms.Char( '$' ); var at = Terms.Char( '@' ); @@ -129,15 +134,28 @@ private static Parser BuildParser() var siblingBodyRef = with.SkipAnd( body ).SkipAnd( dollar ).SkipAnd( identifier ) .Then( static name => (BodySource) new BodyRef( name ) ); - // path: letters/digits/_/-/./forward+back-slash. Terminates at whitespace. + // path: letters/digits/_/-/./forward+back-slash, plus `:` so a + // drive-letter prefix surfaces a clear "absolute path" error + // instead of a generic parse failure when an author writes + // `@C:/foo`. Terminates at whitespace. var bodyPath = Terms.Pattern( - static c => char.IsLetterOrDigit( c ) || c is '_' or '-' or '.' or '/' or '\\' + static c => char.IsLetterOrDigit( c ) || c is '_' or '-' or '.' or '/' or '\\' or ':' ).Then( static buf => { var path = buf.ToString()!; if ( path.StartsWith( '/' ) || path.StartsWith( '\\' ) ) throw new InvalidOperationException( $"WITH BODY `@{path}` is absolute. Body files must live inside the migration's resource folder; use a path relative to it." ); + // Drive-letter prefix (`C:`, `c:`, `Z:`...). Reject before the + // segment scan so the message names the actual shape that + // tripped validation. We don't need to allow `:` anywhere + // else in body paths — embedded resource names don't use it. + if ( path.Length >= 2 && path[1] == ':' && IsDriveLetter( path[0] ) ) + throw new InvalidOperationException( + $"WITH BODY `@{path}` is absolute (drive-letter prefix). Body files must live inside the migration's resource folder; use a path relative to it." ); + if ( path.Contains( ':' ) ) + throw new InvalidOperationException( + $"WITH BODY `@{path}` contains `:`. Body file paths must not contain `:` — embedded resource names don't use it." ); // `..` segment = parent traversal. Allow `.` (current dir) but not // `..` anywhere — split-and-check rather than substring so file // names that legitimately contain dots (`.json`) aren't false- @@ -692,6 +710,8 @@ private static Version ParseVersionLiteral( string literal ) return version; } + + private static bool IsDriveLetter( char c ) => c is >= 'A' and <= 'Z' or >= 'a' and <= 'z'; } public sealed class OpenSearchParseException : Exception diff --git a/src/Hyperbee.Migrations.Providers.OpenSearch/README.md b/src/Hyperbee.Migrations.Providers.OpenSearch/README.md index 930e88a..0bc3e92 100644 --- a/src/Hyperbee.Migrations.Providers.OpenSearch/README.md +++ b/src/Hyperbee.Migrations.Providers.OpenSearch/README.md @@ -119,6 +119,8 @@ Subfolders are optional. The path is just a relative file reference — `@foo.js Path validation is parse-time: - Absolute paths (leading `/` or `\`) are rejected — body files must stay inside the migration's resource folder. +- Drive-letter prefixes (`C:`, `c:`, ...) are rejected — same reason. `Path.IsPathRooted` is platform-dependent (`C:/foo` reads as rooted on Windows but not on Linux); the validator checks the rooted shape explicitly so an author editing on one host can't produce a path that's silently rooted on another. +- Any other `:` in the path is rejected — embedded resource names don't use it. - `..` segments are rejected — no parent-directory traversal. - Allowed characters: letters, digits, `_`, `-`, `.`, `/`, `\`. @@ -288,7 +290,23 @@ CREATE POLICY [WITH BODY $body] APPLY POLICY TO ``` -`CREATE POLICY` uploads the policy to `_plugins/_ism/policies`. `APPLY POLICY` attaches it to existing indices matching the pattern via `_plugins/_ism/add` — the dispatcher inspects the response body and surfaces logical failures explicitly: HTTP 200 with `updated_indices: 0` is mapped to `Failed`, not silent OK. For future-only attachment, declare `ism_template.index_patterns` in the policy body (handled at index-creation time by the cluster). +`CREATE POLICY` uploads the policy to `_plugins/_ism/policies`. `APPLY POLICY` attaches it to existing indices matching the pattern via `_plugins/_ism/add` — the dispatcher inspects the response body and surfaces logical failures explicitly: HTTP 200 with `updated_indices: 0` is mapped to `Failed`, not silent OK. + +#### Three temporal scopes for ISM attachment + +ISM attachment to an index series isn't one problem with three solutions — it's three different problems, each with its own right tool. Pick by *when* the indices that need the policy come into existence relative to the migration that owns the policy. + +| Scope | Right tool | Sample | Notes | +|---|---|---|---| +| **Greenfield** — attach to indices that will be created in the future | `ism_template.index_patterns` in the policy body, `template.aliases` in the index template | 9000 — `ForwardAttachmentLifecycle` | Cluster handles it lazily at index-creation time. No migration runtime cost. Won't help with indices that already exist when the migration runs. | +| **One-time backfill** — attach a policy to a set of indices that already exist at migration run time | Runtime `APPLY POLICY TO ` in a normal `[Migration(N)]` | 4000 — `IsmPolicyAndApply` | Single-shot, journaled. Wildcards adapt to current cluster state at run time. Zero-updated → `Failed` escalation makes it loud when the pattern matches nothing. | +| **Ongoing reconciliation** — keep all matching existing indices on the current policy as the policy evolves | Runtime `APPLY POLICY TO ` in a `[Migration(N, journal: false)]` | 9001 — `OngoingPolicyReconciliation` | Re-runs on every startup. Idempotent on the wire (ISM's `change_policy` is a no-op for already-on-policy indices). The wildcard form is correct because the set of indices to reconcile changes as new ones roll over and old ones are deleted. | + +The three are stackable. A typical mature pipeline uses **greenfield** at install time, **one-time backfill** when an existing series first adopts the policy, and **ongoing reconciliation** as the policy definition evolves over the project's lifetime. Many pipelines never need more than one — but you should choose deliberately rather than reach for runtime `APPLY POLICY` by default. + +The wildcard form of `APPLY POLICY` is the correct expression of "apply to whatever matches now" — that's exactly what backfill and reconciliation want. Don't try to pin to a literal index list as a substitute for forward-attachment; if the goal is "future indices auto-attach," `ism_template` is the right answer. + +Caveat: `ism_template` inside a policy body is the modern endpoint shape. Older AWS-managed clusters served by the legacy `_opendistro/_ism` endpoint may not honor it; if `IsmEndpointDetectStep` resolves to the legacy endpoint, the greenfield row falls back to runtime `APPLY POLICY` (sample 4000's pattern, run once at install time, plus sample 9001's reconciliation pattern for ongoing changes). Modern OpenSearch (2.x and the modern AWS endpoint) supports `ism_template` natively. ### Cluster waits diff --git a/tests/Hyperbee.Migrations.Tests/Providers/OpenSearch/Internal/BodySourceParserTests.cs b/tests/Hyperbee.Migrations.Tests/Providers/OpenSearch/Internal/BodySourceParserTests.cs index 4281f06..bea3e60 100644 --- a/tests/Hyperbee.Migrations.Tests/Providers/OpenSearch/Internal/BodySourceParserTests.cs +++ b/tests/Hyperbee.Migrations.Tests/Providers/OpenSearch/Internal/BodySourceParserTests.cs @@ -97,6 +97,34 @@ public void AtPath_AbsoluteWindows_RejectedAtParseTime() .Where( e => e.Message.Contains( "absolute" ) || e.Message.Contains( "relative" ) ); } + // Drive-letter prefix rejection (cross-platform asymmetry guard). + // `Path.IsPathRooted` is platform-dependent — `C:/foo` reads as rooted on + // Windows but not on Linux — so an author editing on one host could + // produce a manifest that's silently rooted on another. The validator + // checks the rooted shape explicitly. + [TestMethod] + [DataRow( "C:/foo/bar.json" )] + [DataRow( @"C:\foo\bar.json" )] + [DataRow( "c:/foo/bar.json" )] + [DataRow( @"Z:\bar.json" )] + [DataRow( "a:/x.json" )] + public void AtPath_DriveLetterPrefix_RejectedAtParseTime( string path ) + { + var act = () => _parser.Parse( $"CREATE INDEX users WITH BODY @{path}" ); + act.Should().Throw() + .Where( e => e.Message.Contains( "absolute" ) || e.Message.Contains( "drive-letter" ) ); + } + + // Colon anywhere in a body path is rejected — embedded resource names + // never use it, and accepting it would muddy the drive-letter check. + [TestMethod] + public void AtPath_ColonInPath_RejectedAtParseTime() + { + var act = () => _parser.Parse( "CREATE INDEX users WITH BODY @bodies/foo:bar.json" ); + act.Should().Throw() + .Where( e => e.Message.Contains( ":" ) ); + } + [TestMethod] public void AtPath_ParentTraversal_RejectedAtParseTime() {