Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs/decisions/0017-body-source-grammar.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,11 +139,19 @@ contract; migrating existing resources is optional.

### Path validation (parse-time)

The grammar accepts characters `[a-zA-Z0-9_\-./\\]` in `@path`.
The grammar accepts characters `[a-zA-Z0-9_\-./\\:]` in `@path`. The
`:` is in the lexer's accept set only so a drive-letter prefix surfaces
as a clean "absolute path" error rather than a generic parse failure.
Validation rejects at parse time:

- Absolute paths (leading `/` or `\`) — body files must be inside the
migration's resource folder.
- Drive-letter prefix (`C:`, `c:`, ...) — same reason. `Path.IsPathRooted`
is platform-dependent so an author editing on one host could otherwise
produce a manifest that's silently rooted on another.
- Any other `:` in the path — embedded resource names don't use it; the
reject is mechanical because a `:` that isn't a drive-letter prefix
is almost certainly an authoring mistake.
- `..` segments — no parent-directory traversal; each migration's
body files stay self-contained.

Expand Down
74 changes: 74 additions & 0 deletions docs/decisions/0018-split-ledger-and-lock-indices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# ADR-0018: OpenSearch Provider Splits Ledger and Lock into Two Indices

**Status:** Accepted
**Date:** 2026-05-04

## Context

Hyperbee's NoSQL provider family — Aerospike, Couchbase, MongoDB, Postgres — co-locates the migration ledger and the run-lock in a single namespace, distinguished by document id. A reviewer comparing the OpenSearch provider against a sibling internal implementation flagged that the OpenSearch provider deviates from this convention: it ships with two indices, `.migrations` (ledger) and `.migrations-lock` (lock), defaulted by `OpenSearchMigrationOptions.LedgerIndex` and `LockIndex` respectively.

The deviation is intentional but, until now, undocumented as an ADR. The reviewer's observation is correct in two ways:

1. **The convention exists.** The other four providers co-locate; the OpenSearch provider is the outlier.
2. **The deviation is load-bearing.** It exists to serve a specific concurrency invariant that the other providers don't face in the same shape.

The risk of leaving this implicit is that a future provider author copies the wrong convention — either co-locating in OpenSearch when they shouldn't, or splitting in a future provider when they shouldn't. This ADR captures the reasoning so the next implementer makes the choice deliberately.

## The deviation

OpenSearch's primary-shard write contract is shard-replica coupling: a primary write blocks until each in-sync replica acknowledges the write. Under N-runner concurrent lock acquire (R-24b), the lock primary shard is contended; replica-write coupling adds a second source of tail latency on top of the contention itself.

The mitigation (PA-2 from assessment 0002, encoded in `LockIndexInitStep`) is to create the lock index with `number_of_replicas: 0`. The lock document then writes to a single primary with no replica fan-out — eliminating replica-write coupling as a tail-latency contributor under contention.

The ledger index has the opposite needs: it's a forensic record (R-06) used after the fact to answer "what migrations ran, when, in what direction, against what state." Durability matters; tail latency under concurrent writes does not (the lock serializes writes to the ledger). The ledger gets the cluster's normal replica configuration.

Two distinct durability/latency profiles, two indices. The other providers don't face this trade-off because:

- **Aerospike** uses native CAS on a record key in a configured namespace; durability is a namespace-level setting and is not coupled to replica-write semantics on a per-record basis.
- **Couchbase** uses bucket-level durability; the lock is a single document with provider-level coordination.
- **MongoDB** uses a collection-level write concern; the lock is a single document with `findOneAndUpdate` semantics.
- **Postgres** uses `pg_advisory_lock`; the lock is not a row in the ledger table at all.

In each case, the lock's durability story is decoupled from the ledger's, either by language (advisory lock vs row) or by configuration knob (namespace, bucket, collection write concern). OpenSearch couples them through index settings — which means decoupling the two requires two indices.

## Decision

The OpenSearch provider will continue to ship two indices:

- `LedgerIndex` (default `.migrations`) — strict-mapped ledger per R-06, with the cluster's normal replica configuration.
- `LockIndex` (default `.migrations-lock`) — `number_of_replicas: 0` per PA-2 mitigation, asserted by `LockIndexInitStep`.

We will not introduce an option to combine them into a single index. The combined-index shape would either lose the PA-2 mitigation (if the index were configured for ledger-grade durability) or compromise the ledger's durability (if the index were configured for `replicas: 0`). Neither trade-off is worth the cross-provider symmetry.

If a future operator deployment is so IAM-restricted that index creation is gated to a single index, we will reconsider — but only as a documented constrained-mode opt-in, never as a default. ADR-0013's `AssumeIndicesExist` already covers the IAM-restricted case for both indices; no additional surface is needed today.

## Consequences

**Easier:**

- The lock's tail-latency story is clean: under R-24b N-runner contention, the lock primary shard's write path has no replica-coupling component.
- The ledger's durability story is clean: it inherits the cluster's normal replica configuration without per-index special-casing.
- Operators in non-AWS environments who configure cluster-wide replica counts get exactly what they expect for both indices.

**Harder:**

- Operators must monitor / back up two indices. In practice this is one extra entry in any backup or alerting tool; the entries are co-located by name (`.migrations*` glob covers both).
- Cross-provider documentation has to surface the asymmetry. This ADR is the canonical reference; the provider README's "Quick start" continues to default both indices for the common case.
- The next provider author asking "should I co-locate or split?" must read this ADR. The default answer is co-locate (the house style); split only when the lock and ledger have distinct durability or latency requirements that the underlying engine couples through shared configuration.

**Constrains:**

- The lock and ledger indices are part of the public contract of `OpenSearchMigrationOptions`. Removing either as a top-level index requires a superseding ADR.
- The PA-2 invariant (`number_of_replicas: 0` on the lock index) is asserted at startup by `LockIndexInitStep`; weakening this assertion requires a superseding ADR.

## Relation to other ADRs

- **ADR-0013 (Always-Create Lock and Ledger Indices in InitializeAsync with Explicit Override)** — this ADR refines the model that one introduced. ADR-0013 names the two indices and the always-create behavior; this ADR captures *why there are two*.
- **ADR-0005 (Provider-Native Distributed Locking)** — preserved. The split is an OpenSearch-specific implementation choice for native locking; the cross-provider lock contract is unchanged.

## Implementation

- `OpenSearchMigrationOptions.LedgerIndex` (default `.migrations`) and `OpenSearchMigrationOptions.LockIndex` (default `.migrations-lock`).
- `LedgerIndexInitStep` creates the ledger with strict R-06 mapping; replica configuration follows cluster default.
- `LockIndexInitStep` creates the lock with `number_of_replicas: 0` and asserts the value when the index already exists. Mismatch fails at startup with a remediation message.
- `OpenSearchRecordStore` reads the ledger and the lock through these options. There is no path that writes lock state to the ledger index (or vice versa).
1 change: 1 addition & 0 deletions docs/decisions/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@
| 0015 | [Parser is Offline-Pure; All I/O is Runtime Middleware](0015-parser-offline-pure-all-io-runtime.md) | Accepted | 2026-05-02 | Clarifying corollary of ADR-0011; resolves R-30 template lookup ambiguity by deferring all I/O (including template body resolution) to runtime middleware |
| 0016 | [OpenSearch Provider Does Not Use File-Level Templating](0016-no-file-level-templating.md) | Accepted | 2026-05-02 | Strikes R-10; matches Aerospike/Couchbase/MongoDB/Postgres house style (typed options + runtime substitution); deletes Phase 0 Task 0.4 work; removes Hyperbee.Templating dependency |
| 0017 | [Body-Source Grammar — Three Resolution Forms](0017-body-source-grammar.md) | Accepted | 2026-05-02 | `WITH BODY @path` direct file reference + `bodies.<name>` structured section + ADR-0009 sibling-property fallback for back-compat; parse-time path validation rejects absolute paths and `..` traversal |
| 0018 | [OpenSearch Provider Splits Ledger and Lock into Two Indices](0018-split-ledger-and-lock-indices.md) | Accepted | 2026-05-04 | Captures why the OpenSearch provider deviates from the Aerospike/Couchbase/MongoDB/Postgres single-namespace convention: distinct durability/latency profiles (PA-2 lock `replicas:0`, normal ledger durability) require two indices |
18 changes: 17 additions & 1 deletion docs/site/opensearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,8 @@ JSON bodies attach to a statement via `WITH BODY <ref>`. The provider supports t
The `@`-prefixed path loads an embedded resource relative to the migration's own resource folder. Use this for any body that would otherwise dominate the `statements.json` file -- large mappings, ISM policies, reusable templates. Subfolders are optional. Path validation is parse-time:

- Absolute paths (leading `/` or `\`) are rejected -- body files must stay inside the migration's resource folder.
- Drive-letter prefixes (`C:`, `c:`, ...) are rejected -- same reason. `Path.IsPathRooted` is platform-dependent (`C:/foo` reads as rooted on Windows but not on Linux); the validator checks the rooted shape explicitly so an author editing on one host can't produce a path that's silently rooted on another.
- Any other `:` in the path is rejected -- embedded resource names don't use it.
- `..` segments are rejected -- no parent-directory traversal.
- Allowed characters: letters, digits, `_`, `-`, `.`, `/`, `\`.

Expand Down Expand Up @@ -509,12 +511,26 @@ Uploads the policy to `_plugins/_ism/policies` (or `_opendistro/_ism/policies` o
APPLY POLICY <id> TO <pattern> [NO WAIT("<reason>")]
```

Attaches the policy to existing indices matching the pattern via `_plugins/_ism/add`. The dispatcher inspects the response body and surfaces logical failures explicitly: HTTP 200 with `updated_indices: 0` is mapped to `Failed`, not silent OK. For future-only attachment, declare `ism_template.index_patterns` in the policy body (handled at index-creation time by the cluster).
Attaches the policy to existing indices matching the pattern via `_plugins/_ism/add`. The dispatcher inspects the response body and surfaces logical failures explicitly: HTTP 200 with `updated_indices: 0` is mapped to `Failed`, not silent OK.

```json
{ "statement": "APPLY POLICY hot-warm-cold TO logs-*" }
```

#### Three temporal scopes for ISM attachment

ISM attachment to an index series isn't one problem with three solutions -- it's three different problems, each with its own right tool. Pick by *when* the indices that need the policy come into existence relative to the migration that owns the policy.

| Scope | Right tool | Sample | Notes |
|---|---|---|---|
| **Greenfield** -- attach to indices that will be created in the future | `ism_template.index_patterns` in the policy body, `template.aliases` in the index template | 9000 -- `ForwardAttachmentLifecycle` | Cluster handles it lazily at index-creation time. No migration runtime cost. Won't help with indices that already exist when the migration runs. |
| **One-time backfill** -- attach a policy to a set of indices that already exist at migration run time | Runtime `APPLY POLICY <id> TO <pattern>` in a normal `[Migration(N)]` | 4000 -- `IsmPolicyAndApply` | Single-shot, journaled. Wildcards adapt to current cluster state at run time. Zero-updated -> `Failed` escalation makes it loud when the pattern matches nothing. |
| **Ongoing reconciliation** -- keep all matching existing indices on the current policy as the policy evolves | Runtime `APPLY POLICY <id> TO <pattern>` in a `[Migration(N, journal: false)]` | 9001 -- `OngoingPolicyReconciliation` | Re-runs on every startup. Idempotent on the wire (ISM's `change_policy` is a no-op for already-on-policy indices). The wildcard form is correct because the set of indices to reconcile changes as new ones roll over and old ones are deleted. |

The three are stackable. A typical mature pipeline uses **greenfield** at install time, **one-time backfill** when an existing series first adopts the policy, and **ongoing reconciliation** as the policy definition evolves over the project's lifetime. Many pipelines never need more than one -- but you should choose deliberately rather than reach for runtime `APPLY POLICY` by default.

Caveat: `ism_template` inside a policy body is the modern endpoint shape. Older AWS-managed clusters served by the legacy `_opendistro/_ism` endpoint may not honor it; if `IsmEndpointDetectStep` resolves to the legacy endpoint, the greenfield row falls back to runtime `APPLY POLICY` (sample 4000's pattern, run once at install time, plus sample 9001's reconciliation pattern for ongoing changes). Modern OpenSearch (2.x and the modern AWS endpoint) supports `ism_template` natively.

### WAIT FOR (cluster health)

```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@
<None Remove="Resources\6000-MigrateIndexComposite\statements.json" />
<None Remove="Resources\7000-ReversibleAlias\statements.json" />
<None Remove="Resources\8000-UnsafeReindex\statements.json" />
<None Remove="Resources\9000-ForwardAttachmentLifecycle\statements.json" />
<None Remove="Resources\9000-ForwardAttachmentLifecycle\bodies\component.json" />
<None Remove="Resources\9000-ForwardAttachmentLifecycle\bodies\template.json" />
<None Remove="Resources\9000-ForwardAttachmentLifecycle\bodies\policy.json" />
<None Remove="Resources\9001-OngoingPolicyReconciliation\statements.json" />
</ItemGroup>

<ItemGroup>
Expand All @@ -28,6 +33,11 @@
<EmbeddedResource Include="Resources\6000-MigrateIndexComposite\statements.json" />
<EmbeddedResource Include="Resources\7000-ReversibleAlias\statements.json" />
<EmbeddedResource Include="Resources\8000-UnsafeReindex\statements.json" />
<EmbeddedResource Include="Resources\9000-ForwardAttachmentLifecycle\statements.json" />
<EmbeddedResource Include="Resources\9000-ForwardAttachmentLifecycle\bodies\component.json" />
<EmbeddedResource Include="Resources\9000-ForwardAttachmentLifecycle\bodies\template.json" />
<EmbeddedResource Include="Resources\9000-ForwardAttachmentLifecycle\bodies\policy.json" />
<EmbeddedResource Include="Resources\9001-OngoingPolicyReconciliation\statements.json" />
</ItemGroup>

<ItemGroup>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
using Hyperbee.Migrations.Providers.OpenSearch.Resources;

namespace Hyperbee.Migrations.OpenSearch.Samples.Migrations;

// Sample 9: forward-attachment lifecycle for greenfield pipelines.
//
// Contrast with sample 4 (IsmPolicyAndApply), which demonstrates the
// runtime APPLY POLICY path — necessary when you need to attach a policy
// to indices that ALREADY exist (backfill).
//
// For pipelines starting clean — daily rollover indices for a new
// application, fresh log streams, anything where the migration runs
// before the indices exist — declarative attachment is preferable. The
// migration installs only the cluster-level scaffolding:
//
// CREATE COMPONENT — shared settings/mappings, declared once.
// CREATE TEMPLATE — `index_patterns` matches the rollover series; the
// template's `template.aliases` block wires the
// alias automatically when a matching index is
// created.
// CREATE POLICY — the policy body's `ism_template.index_patterns`
// block attaches the policy to any matching index
// at creation time.
//
// Note: there is NO runtime APPLY POLICY and NO runtime ALIAS ADD. The
// first index in the series — created later by the application, by daily
// rollover, or by a successor migration — picks up everything: settings,
// mappings, alias, lifecycle policy.
//
// When to use this pattern vs. sample 4:
//
// - greenfield series (no existing indices) -> sample 9 pattern
// - existing indices that need a new policy -> sample 4 pattern
// - new policy applies to BOTH existing and future -> both: sample 4
// pattern PLUS an
// `ism_template`
// block in the policy
//
// Caveat: `ism_template` inside an ISM policy body is the modern endpoint
// (`_plugins/_ism/policies`). Older AWS-managed clusters served by the
// legacy `_opendistro/_ism` endpoint may not recognize it; the bootstrap
// `IsmEndpointDetectStep` resolves which endpoint is active, but the
// declarative `ism_template` shape itself is a property of the modern
// schema. If you target a legacy endpoint, fall back to sample 4's
// runtime APPLY for forward attachment.

[Migration( 9000 )]
public class ForwardAttachmentLifecycle( OpenSearchResourceRunner<ForwardAttachmentLifecycle> runner ) : Migration
{
public override Task UpAsync( CancellationToken cancellationToken = default )
=> runner.StatementsFromAsync( "statements.json", cancellationToken );
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
using Hyperbee.Migrations.Providers.OpenSearch.Resources;

namespace Hyperbee.Migrations.OpenSearch.Samples.Migrations;

// Sample 9.1: ongoing policy reconciliation.
//
// The third of three temporal scopes for ISM attachment. Pair it with
// sample 4 (one-time backfill via runtime APPLY) and sample 9
// (greenfield via `ism_template`).
//
// Why this exists. Sample 9 installs a policy whose body has an
// `ism_template.index_patterns` block — new indices in the
// `sample_app_events-*` series auto-attach at creation. But when the
// policy DEFINITION later evolves (a new state added, transition
// criteria adjusted, retention reduced from 90d to 30d), existing
// indices that are already attached keep running on their cached copy
// of the policy until something explicitly re-attaches them.
//
// This migration runs `APPLY POLICY` against the same wildcard pattern
// the policy's `ism_template` covers — and it is journaled = false so
// it re-runs on every startup. The ISM `change_policy` API is
// idempotent: indices already on the current policy are a no-op, so
// re-running is cheap. The wildcard form is correct because the set of
// indices to reconcile changes as new ones roll over and old ones are
// deleted by the policy's own delete state.
//
// When NOT to use this pattern.
//
// - Greenfield-only series with policies that never change: sample 9
// alone is enough. Don't add reconciliation noise on every startup
// for a thing that's already convergent.
// - One-time backfill of indices that exist before the policy:
// sample 4 (a normal `[Migration(N)]`) is the right tool. Don't
// reach for journaled = false unless the migration genuinely needs
// to run more than once.
// - Authoring-time-only enumeration of "these specific indices get
// this policy": just put the literal set in a normal migration; the
// wildcard story is for cluster-state-driven sets.

[Migration( 9001, journal: false )]
public class OngoingPolicyReconciliation( OpenSearchResourceRunner<OngoingPolicyReconciliation> runner ) : Migration
{
public override Task UpAsync( CancellationToken cancellationToken = default )
=> runner.StatementsFromAsync( "statements.json", cancellationToken );
}
Loading
Loading