Merged
Conversation
Adds research, requirements, design, plan, and ADRs 0011-0015 for the OpenSearch provider implementation. Plan calibrated to maintainer velocity (3-7 days focused work) across 4 phases. ADRs: - 0011 Hybrid parser+runtime injection - 0012 WithProductionDefaults() extension method - 0013 Always-create indices with explicit override - 0014 State-machine facade over IBootstrapStep pipeline - 0015 Parser is offline-pure; all I/O is runtime middleware
Adds src/Hyperbee.Migrations.Providers.OpenSearch with minimal Phase 0 surface area: - OpenSearchMigrationOptions (WaitMode/ClusterHealthThreshold/ ContextResolutionPolicy enums; lock parameters per ADR-0011/0014) - AddOpenSearchMigrations + WithProductionDefaults extensions (full impl deferred to Phase 6 per plan) - README.md, csproj mirroring Aerospike layout Adds OpenSearch.Client/OpenSearch.Net 1.8.0 + AwsSigV4 1.8.0 to Directory.Packages.props. Registers the project in the slnx solution. Build clean: 0 warnings, 0 errors across net8/9/10. Existing CS0618 warnings in integration tests are unrelated (Testcontainers parameterless ctor obsolescence).
Mirrors the Aerospike harness shape per Style Reference Pattern 1. Single-node OpenSearch 2.18.0 with security plugin disabled for tests; captures IOpenSearchClient (high-level) and OpenSearchLowLevelClient (low-level for raw HTTP, used by spike tests for wire-level assertions). Hello-world test gated by #if INTEGRATIONS per ADR-0010. Enable by uncommenting the //#define INTEGRATIONS at file top. Image is pinned by tag now; per plan amendment A11/NF-6, CI should pin by sha256 digest. Version-support contract documented in container header (tested 2.18.0, min 2.0.0, AWS Managed ISM endpoint caveat).
Wires the four-scope template renderer (env, config, runtime, secrets)
per R-10 and ADR-0015. Renderer runs BEFORE the parser; offline-pure;
no I/O.
- OpenSearchResourceTemplateRenderer wraps Hyperbee.Templating.Text.
Template.Render with scope-prefixed identifiers (e.g.
{{config.indexPrefix}})
- SecretMarker + SecretValue types as Phase 6 scaffolding for the
log-scrubber pipeline (per R-10, value-coupled redaction by content
hash, not name-coupled)
- Custom Validator on TemplateOptions admits dotted scope keys plus
bracket-suffix indexing (runtime.nodes[0])
- 3 smoke tests: simple substitution, {{#if}} inside JSON, {{each}}
inside JSON — all passing on net8/9/10
First-contact note (PM-5 mitigation): the templating engine's default
identifier validator forbids '.' in member names; we override it.
This is documented inline in the renderer for future reference.
Adds Hyperbee.Templating 3.4.1 to Directory.Packages.props.
Phase 0 architectural-core spike validating ADR-0011 (hybrid
parser+runtime injection) and ADR-0015 (parser is offline-pure).
Provider library:
- Internal/Ast: StatementAst (abstract record), BodyRef (sibling JSON
property reference), CreateIndexAst (with InjectDynamicStrict flag),
ReindexAst (with InjectOpTypeCreate + UnsafeJustification flags)
- Internal/Grammar: OpenSearchStatementParser using Parlot
combinators per ADR-0001 / Style Reference Pattern 3 (static parser
cache, case-insensitive keywords, backtick-or-plain identifiers,
ordered OneOf disambiguation). Supports CREATE INDEX [IF NOT EXISTS]
[WITH BODY $body] and REINDEX [UNSAFE("<reason>")] FROM <src> TO
<dst> [WITH BODY $body]. Bare-UNSAFE rejected at parse per R-18.
- Internal/Middleware: SafeDefaultMergeMiddleware merges AST flags
into JsonNode trees at request-build time. Component-template-aware
dynamic:strict injection (skips on composed_of per R-17 / PM-4).
op_type:create injection on REINDEX with idempotent + conflict
detection (PM-3); SafeDefaultConflictException on conflict points
authors to REINDEX UNSAFE.
Unit tests (36 tests, all passing on net8/9/10):
- AstTests: 6 tests covering record equality + verb names
- OpenSearchStatementParserTests: 18 tests (positive + negative
including bare-UNSAFE rejection, missing-name rejection,
case-insensitive keywords)
- SafeDefaultMergeMiddlewareTests: 12 tests covering all 5 documented
CREATE INDEX edge cases + REINDEX edge cases + tree-immutability
invariant
Phase 0 kill criterion (per assessment 0003 / A8) NOT FIRED at unit
level. Live-cluster validation (Task 0.6) requires Docker; deferred
to user environment for the 10 wire-level integration tests.
Total OpenSearch unit tests across project: 39 (incl. 3 from Task 0.4
Templating spike). 117 test executions across 3 TFMs, 0 failures.
10 integration tests against real OpenSearch (Testcontainers, gated by #if INTEGRATIONS per ADR-0010) that fire the Phase 0 kill criterion: "Merge logic cannot deterministically produce expected JSON without ambiguity for any of the 5 documented edge cases." Tests use OpenSearchLowLevelClient (DisableDirectStreaming on) to capture actual HTTP request bodies via ApiCall.RequestBodyInBytes. CREATE INDEX edge cases (5): - Flat body without mappings -> dynamic:strict injected on the wire - Body with explicit mappings.dynamic:true -> preserved - Body with composed_of -> injection skipped (R-17 / PM-4) - Body with mappings.properties only -> dynamic:strict added alongside - Body with settings only -> mappings block created with dynamic:strict REINDEX edge cases (5): - No body -> full payload built with op_type:create (PM-3 fix) - Body with dest object -> op_type:create added; user fields preserved - Body with op_type:index -> SafeDefaultConflictException points to UNSAFE remediation per R-18 - Body with explicit op_type:create -> exactly one op_type:create on the wire (idempotent inject) - KEYSTONE round-trip test: seeds src with 3 docs, pre-seeds dst with one doc using the same _id (simulating partial prior run), runs reindex, asserts version_conflicts:1, dst has exactly 3 docs (no double-write), and the pre-seeded doc was NOT overwritten by op_type:create Build verified clean with AND without INTEGRATIONS defined. To run: uncomment //#define INTEGRATIONS at file top, then dotnet test with --filter "TestCategory=Spike". Phase 0 implementation complete (6/6 tasks). Architecture validated at unit level; live-cluster gate awaits user's Docker environment.
Records the decision (raised by maintainer review of Phase 0 Task 0.4) to match house style with the other four providers (Aerospike, Couchbase, MongoDB, Postgres). Env-variation handled via typed OpenSearchMigrationOptions + IConfiguration binding, not via a templating engine. Strikes R-10, amends R-25, and removes Hyperbee.Templating dependency. The Phase 0 Task 0.4 spike code is deleted; validation that the engine works is preserved as a Learnings Ledger entry, not as committed code. Re-introducing templating requires a superseding ADR.
Deletes the Phase 0 Task 0.4 spike code that wired Hyperbee.Templating as a four-scope file-level renderer. Per ADR-0016, the OpenSearch provider matches the house pattern (Aerospike/Couchbase/MongoDB/ Postgres): env-variation flows through typed OpenSearchMigrationOptions and per-environment IConfiguration, not a templating engine. Removed: - src/.../Templating/OpenSearchResourceTemplateRenderer.cs - src/.../Templating/SecretMarker.cs - src/.../Templating/SecretValue.cs - tests/.../Templating/OpenSearchResourceTemplateRendererTests.cs - Hyperbee.Templating from Directory.Packages.props - <PackageReference Include="Hyperbee.Templating" /> from Hyperbee.Migrations.Providers.OpenSearch.csproj Build clean across net8/9/10. 36 OpenSearch unit tests pass (the 3 templating tests are gone; architectural-core tests for AST + grammar + safe-default merge middleware remain intact). The Phase 0 Task 0.4 spike validated the engine works (and surfaced 4 real first-contact issues in Hyperbee.Templating 3.4.1 — see plan Learnings Ledger). The spike result is preserved as documentation; the code is removed because validation that something is feasible is not justification that it should be adopted (see ADR-0016 Context).
…plating) Strikes R-10 (Hyperbee.Templating renderer); amends R-25 to drop SecretScrubber routing; updates Constraints to call out the no-templating decision; updates Decided list with the rationale; marks R-24c sub-test (l) as removed. Plan updates: - Phase 0 Task 0.4 marked REVERTED with pointers to commits b2febba (added) and 95825f0 (removed); Learnings Ledger preserves the four PM-5 first-contact issues (the engine's actual quirks, useful if the decision is ever revisited) - Phase 2 Task 2.7 — Templating renderer line removed - R-24c (l) row marked REMOVED - Status Summary updated: 36 unit tests now (was 39 with the spike), 108 test runs (was 117) Design updates: - Architecture diagram strips Templating Renderer block and SecretScrubberSink line; replaces with explanatory note pointing to ADR-0016 - Data-flow steps updated: resource files go directly to Parlot; no rendering step - Risks-and-Open-Questions: the Hyperbee.Templating + SecretMarker first-contact bug is REMOVED (eliminated by not adopting) - Key Decisions section now lists all 6 ADRs (0011-0016) with links No code changes; the code change for templating removal landed in commit 95825f0 (Refactor: Remove Hyperbee.Templating dependency).
State-machine facade over IBootstrapStep[] pipeline. Public contract:
bootstrapper.RunAsync() -> BootstrapResult { Status, Steps[], FailedAt }
The Steps projection lets operators identify the failing step without
parsing log strings (per ADR-0014 design intent).
Components:
- IBootstrapStep interface
- BootstrapContext (immutable shared state passed to steps)
- StepOutcome (per-step result with status, duration, detail, exception)
- BootstrapResult (terminal outcome with all step outcomes + FailedAt)
- OpenSearchBootstrapper (the facade) - sequential execution; halts on
first failure; OperationCanceledException short-circuits the pipeline
- Default steps:
- RestPingStep: cheapest cluster reachability probe
- ClusterHealthStep: blocks server-side via wait_for_status query
(mitigates PA-12 client-side polling storm); honors R-03 threshold
- OpenSearchExceptions: typed hierarchy for callers to pattern-match on
(OpenSearchNotReadyException, OpenSearchLedgerSchemaMismatchException,
MigrationLockExpiredException, AwsSigV4NotConfiguredException)
7 new unit tests (43 total OpenSearch tests, 129 runs across net8/9/10,
0 failures). Tests use stub steps with NSubstitute-mocked
IOpenSearchClient — no Docker dependency.
DI registration deferred to Slice C (after lock + ledger steps land);
the bootstrapper instance is constructed inline in tests until then.
Adds the two index-init steps to the bootstrapper pipeline per ADR-0013. LedgerIndexInitStep: - Idempotent create with strict mapping per R-06 (forensic fields: id, runOn, direction, status, appliedBy, checksum, error, failedStatementIndex) - AssumeIndicesExist=true: verify-only path checks all 8 required fields; mismatch surfaces OpenSearchLedgerSchemaMismatchException with explicit field list LockIndexInitStep: - Idempotent create with number_of_replicas=0 (PA-2 mitigation — eliminates replica-write coupling on the lock primary shard under N concurrent runners) - AssumeIndicesExist=true: verify-only; missing index fails with guidance pointing to the required mapping shape Both steps use IOpenSearchClient.Indices.ExistsAsync for HEAD checks and the LowLevel client for raw-JSON CreateAsync (avoids POCO mapping ergonomics for the small, auditable schemas). DI wiring (ServiceCollectionExtensions.cs): - IBootstrapStep[] singletons registered in execution order: RestPingStep -> ClusterHealthStep -> LedgerIndexInitStep -> LockIndexInitStep - OpenSearchBootstrapper registered as singleton - IMigrationRecordStore still NOT registered (deferred until LockHandle + RecordStore land) Init-step internals (HTTP round-trips) are exercised via integration tests, not unit tests — mocking IOpenSearchClient.Indices fluent descriptors is fragile. Orchestration logic is fully unit-tested at the OpenSearchBootstrapper level via stub steps. Build clean across net8/9/10. 43 OpenSearch unit tests still pass.
…R-0003) Auto-renewing distributed lock ported from AerospikeRecordStore with OpenSearch-specific deltas: LockDocument (POCO): - Strict-mapped fields: name, owner, acquiredAt, lastHeartbeat - PropertyName attributes match LockIndexInitStep mapping exactly LockHandle (IDisposable, internal): - CAS via if_seq_no + if_primary_term (OpenSearch optimistic concurrency) - Heartbeat renewal loop using TimeProvider; deadline = now + LockMaxLifetime - LockExpired CT (R-05 / PM-12) signals when: - LockMaxLifetime ceiling is hit - Renewal CAS conflicts (another runner has taken over) - Dispose: cancels renewal, best-effort CAS-guarded DELETE; tolerates 409/404 (lock already gone) OpenSearchRecordStore (IMigrationRecordStore per ADR-0003): - ValidateLockTuning at ctor enforces R-05 invariants (LockRenewInterval < LockStaleAfter < LockMaxLifetime AND LockStaleAfter >= 2 * LockRenewInterval) - InitializeAsync runs the bootstrapper pipeline; failure converts BootstrapResult.FailedAt to OpenSearchNotReadyException - CreateLockAsync acquires via op_type=create + refresh=wait_for; on 409, realtime-GET path (NF-1) inspects staleness and CAS-overwrites if holder is past LockStaleAfter - TryTakeOverAsync: realtime: true on GET to defeat refresh-lag false positives (assessment 0002 NF-1) - RenewLockAsync: verify-then-update pattern; CAS conflict surfaces MigrationLockUnavailableException so LockHandle signals LockExpired - ReleaseLockAsync: CAS-guarded DELETE; logs gracefully on 409/404 - ExistsAsync / ReadAsync / WriteAsync / DeleteAsync: ledger CRUD with refresh=wait_for on writes (per R-07) DI: IMigrationRecordStore now registered as singleton (was deferred). The full provider DI surface is now complete for Phase 1 foundation. 7 new unit tests for ValidateLockTuning (50 OpenSearch tests total, 150 runs across net8/9/10, 0 failures). The lock CAS state machine (acquire 409 → realtime GET → takeover, renewal CAS conflict, etc.) is best validated against real OpenSearch in integration tests (R-24b territory) — coming in a future commit.
Extends the Parlot grammar with all six remaining foundation verbs. AST + parser only (parse-time work per ADR-0011/0015); statement compilers and runtime middleware for these verbs are Phase 2. Verbs added: - DROP INDEX <name> [IF EXISTS] - UPDATE MAPPING ON <idx> [WITH BODY $body] - UPDATE SETTINGS ON <idx> [CLOSE] [WITH BODY $body] (CLOSE flag opts into close->update->open dance for static settings per R-08a) - REFRESH <name> - WAIT FOR <green|yellow> [ON <idx>] [TIMEOUT <duration>] (per-index scoping per NF-3 to avoid stalling on permanently-yellow plugin indices like .opendistro_security) - WAIT UNTIL TASK <id> COMPLETE [TIMEOUT <duration>] (Tasks API polling per R-11; backticked id for node:task format) Duration grammar: <integer><s|m|h> with explicit suffix required. Pure integers without a suffix in trailing TIMEOUT clauses currently parse as silently-ignored trailing input (Parlot's ZeroOrOne is lenient); strict EOF matching is a Phase 2 hardening item. Top-level OneOf order documents the disambiguation pattern (Style Reference Pattern 3): when verbs share prefix tokens (e.g., UPDATE MAPPING vs UPDATE SETTINGS), the more-specific arm comes first. 24 new parser tests (74 OpenSearch tests total, 222 runs across net8/9/10, 0 failures). Tests cover positive paths for every verb + optional clause combinations + 3 negative cases (missing required clauses). Phase 1 remaining: IF [NOT] EXISTS live HEAD checks (runtime), the ImplicitWaitMiddleware (R-12), parse-time R-18 unsafe-op enumeration, R-24b lock contention integration tests. Statement compilers (AST -> IRequest dispatch) for these verbs are Phase 2.
…tion tests remain)
Ran the existing Testcontainers infrastructure (Docker available on
this dev machine) and validated end-to-end against a real OpenSearch
2.18.0 cluster:
- 11 spike tests (Phase 0 kill criterion CLEARED)
* Includes the keystone Reindex_RoundTrip_OpTypeCreate_PreventsDoubleWrite
test: pre-seeded dst, 3 docs in src, op_type:create skips the
pre-existing _id, version_conflicts:1, dst has exactly 3 docs,
pre-seeded doc preserved. ADR-0011 hybrid architecture validated.
- 6 Phase 1 integration tests (new): bootstrapper end-to-end, lock
acquire/release/contention, ledger CRUD, BootstrapResult per-step
inspection (ADR-0014 surface)
Real bugs found and fixed during validation:
1. SafeDefaultMergeMiddleware composed_of skip logic — the assertion
was checking against a body shape OpenSearch CREATE INDEX rejects
("unknown key [composed_of] for create index"). composed_of is a
PUT /_index_template field, not a PUT /<index> field. Test
converted to merge-layer-only assertion; PM-4's risk surface
applies to CREATE TEMPLATE / CREATE COMPONENT verbs (Phase 2),
not direct index creation. Behavior is preserved (defensive code
in middleware) but tested in isolation rather than via cluster.
2. Reindex round-trip needed conflicts:proceed — default
(conflicts:abort) returns 409 from /_reindex on first version
conflict instead of completing with version_conflicts in the body.
Test now sets conflicts:proceed explicitly. (Whether the safe-
default merge should also inject this is a Phase 2 design
question — for migrations, proceed is the right default.)
3. CreateLockAsync / TryTakeOverAsync / RenewLockAsync /
ReleaseLockAsync now catch OpenSearchClientException with status
409 — the harness uses ConnectionSettings.ThrowExceptions() (so
spike tests can assert on response.Success). Production code
shouldn't depend on whether ThrowExceptions is on; both paths
(non-throwing 409 response, throwing 409 exception) are now
handled identically.
Test files use //#define INTEGRATIONS commented-out per house
pattern (matches AerospikeRunnerTest etc.). To run locally:
uncomment the #define at file top and `dotnet test`.
74 unit tests still pass on net8/9/10 (build clean, 0 errors).
…inst real cluster)
Bridges parsed AST nodes to actual HTTP dispatch via the OpenSearchClient.
Per ADR-0011 hybrid: parser owns intent; dispatcher applies safe-default
merge then dispatches via low-level client.
Components:
- StatementResult: typed outcome (Executed | Skipped | Failed) + verb +
detail + HTTP status + exception
- StatementContext: per-call execution context (client, options, time
provider, logger, resolved body, cancellation)
- StatementDispatcher: switch-on-AST handler for all 8 verbs:
* CREATE INDEX - HEAD probe for IF NOT EXISTS, then merge + create
* DROP INDEX - HEAD probe for IF EXISTS, then delete
* UPDATE MAPPING - PUT /<idx>/_mapping
* UPDATE SETTINGS [CLOSE] - close->update->open dance for static settings
* REFRESH - POST /<idx>/_refresh
* WAIT FOR <yellow|green> [ON <idx>] - high-level Cluster.HealthAsync
(low-level DoRequestAsync rejects embedded query strings; bug found
via integration test)
* WAIT UNTIL TASK <id> COMPLETE - Tasks API polling with exp backoff
(500ms -> 30s ceiling)
* REINDEX - merge op_type:create + dispatch via _reindex
Uses low-level client (StringResponse) for body-bearing verbs to avoid
ThrowExceptions divergence found during Phase 1 validation.
Validated end-to-end against real OpenSearch 2.18.0 (Testcontainers):
- 11 spike tests (Phase 0 kill criterion)
- 6 RecordStore tests (Phase 1 lock+ledger+bootstrapper)
- 10 dispatcher tests (this slice)
= 27 of 27 pass.
Real bugs found and fixed during integration:
- Cluster.Health LowLevel API rejects embedded query strings; switched
to high-level Cluster.HealthAsync with selectors
- Reindex round-trip test now pre-declares schema (the dispatcher's
dynamic:strict default correctly rejects undeclared fields — this
validates the safe-default works at the cluster level!)
74 unit tests still pass on net8/9/10. House pattern preserved
(//#define INTEGRATIONS commented; uncomment locally to run).
…ration runs) Closes the bridge from "infrastructure exists" to "writing a migration actually runs it." Authors can now write a Migration class with a sibling statements.json resource and have the provider parse, merge safe-defaults, and dispatch each statement against OpenSearch. OpenSearchResourceRunner<TMigration>: - StatementsFromAsync(resourceName) — embedded-resource path matching AerospikeResourceRunner / Couchbase house pattern (ADR-0002) - RunStatementsFromJsonAsync(json) — public test-friendly entry point for callers that have a JSON string in hand - Loop: load -> parse via OpenSearchStatementParser -> resolve $body sibling reference (R-09) -> dispatch via StatementDispatcher - Failed statements throw MigrationException with statement index + verb in the message (so authors can identify which one failed) DI: registers OpenSearchStatementParser, SafeDefaultMergeMiddleware, StatementDispatcher (singletons) and OpenSearchResourceRunner<> (transient — per-migration logger). Validated end-to-end against real OpenSearch (Testcontainers): 4 new integration tests (now 31/31 across all OpenSearch integration suites). Tests: - Multi-statement migration (CREATE INDEX with body + REFRESH + WAIT FOR YELLOW) runs all statements in order - Safe defaults applied: dynamic:strict gets injected by middleware, cluster correctly rejects undeclared field after pipeline runs - Failed statement (UPDATE MAPPING with no body) wraps in MigrationException with statement index + verb in message - Missing $body sibling property surfaces a clear error naming the ref Phase 1 is now end-to-end functional: an author writing a migration can dispatch a complete `statements.json` against OpenSearch. Remaining Phase 1 polish: ImplicitWaitMiddleware (R-12), parse-time R-18 unsafe-op detection, R-24b lock contention/crash recovery tests. 74 unit tests still pass on net8/9/10. House pattern preserved (//#define INTEGRATIONS commented).
Closes Phase 1 with the three remaining items. ImplicitWaitMiddleware (R-12, NF-3): - Wired into StatementDispatcher for mutating verbs (CREATE INDEX, REINDEX, UPDATE SETTINGS) — fires _cluster/health after success - Scoped to the mutated index per NF-3 (avoids stalling on permanently- yellow plugin indices like .opendistro_security) - Honors WaitMode: PerStatement (SDK default) is fully implemented; PerMigration is a no-op stub with a Phase 6 hook (requires resource- runner-level dirty-index tracking + consolidated end-of-migration wait); Off skips the wait entirely - Best-effort: failures log a warning and don't fail the statement result. Stronger guarantees come from explicit WAIT FOR statements R-24b lock contention/crash recovery integration tests (3 tests with FakeTimeProvider for fast deterministic time control): - ConcurrentAcquire — two RecordStore instances racing; loser surfaces MigrationLockUnavailableException (standard CAS path) - LockMaxLifetime — uses FakeTimeProvider to fast-forward past the deadline; verifies LockHandle.LockExpired CT fires per R-05/PM-12. Loop yields between Advance calls so heartbeat continuation runs - StaleLock takeover — plants a stale lock document directly via the low-level client (avoids race with the lock holder's own heartbeat), then store2 acquires via realtime-GET CAS overwrite per NF-1 Adds Microsoft.Extensions.TimeProvider.Testing reference to the integration tests project (already in Directory.Packages.props). R-18 syntactic body-content enumeration: DEFERRED to Phase 2 with documented note. Requires body-content inspection (mapping field-type changes, static-settings detection) that violates ADR-0015 offline-pure parser. Existing parse-time enforcement (UNSAFE/NO WAIT justification tokens, missing-name rejection) covers the pure-syntactic cases. Phase 1 totals: - 74 unit tests pass on net8/9/10 (222 runs, 0 failures) - 34 integration tests pass against real OpenSearch 2.18.0: * 11 spike (Phase 0 kill criterion CLEARED) * 6 RecordStore (bootstrapper, lock acquire/release, ledger CRUD) * 10 dispatcher (every verb end-to-end) * 4 resource runner (multi-statement migrations) * 3 R-24b (concurrent acquire, max-lifetime, stale-takeover) - House pattern preserved (//#define INTEGRATIONS commented) - Build clean: 0 errors, only pre-existing CS0618 warnings on Testcontainers parameterless ctors Phase 1 architecture and runtime are validated end-to-end against a real cluster. Phase 2 work (templates, ISM, MIGRATE INDEX composite, WHEN VERSION semver, R-18 semantic body inspection, full SigV4 endpoint detection) builds on this foundation.
Adds the three alias verbs that complete the zero-downtime cutover
pattern. ALIAS SWAP is the headline value-add per R-16/NF-2 — single
atomic _aliases POST with both remove + add actions, no separate-
GET-then-POST TOCTOU window.
Components:
- AliasSwapAst (alias, oldIndex, newIndex)
- AliasAddAst (alias, indexName)
- AliasRemoveAst (alias, indexName)
- Parser grammar: ALIAS [SWAP|ADD|REMOVE] sub-verb dispatch
- StatementDispatcher handlers for each verb — all use POST /_aliases
via DoRequestAsync (the LowLevel Indices namespace doesn't expose
BulkAlias on this OpenSearch.Net version)
ALIAS SWAP body shape:
{
"actions": [
{ "remove": { "index": "<old>", "alias": "<a>", "must_exist": true } },
{ "add": { "index": "<new>", "alias": "<a>" } }
]
}
`must_exist: true` is the R-16 atomic-precondition signal — without
it, OpenSearch would silently no-op a remove of a non-existent alias.
With it, the cluster atomically rejects the whole multi-action body
when the precondition fails. (Note: OpenSearch 2.18 is permissive
about this in some cases; the integration test asserts the actual
correctness guarantee — alias never points at both indices
simultaneously after a swap — which IS guaranteed by the atomic
multi-action body.)
7 new unit tests (81 OpenSearch unit tests total, 243 runs across
net8/9/10, 0 failures): positive parse cases for all three verbs +
backtick handling + case-insensitive keywords + 2 negative cases.
4 new integration tests against real OpenSearch:
- AliasAdd points alias at index
- AliasRemove detaches alias
- AliasSwap atomically moves alias from old to new
- AliasSwap atomic post-condition: alias never on both indices
(R-16 atomicity guarantee)
ALIAS SWAP wires through ImplicitWaitMiddleware (per R-12) to gate
subsequent statements on cluster health post-swap.
House pattern preserved (//#define INTEGRATIONS commented). Build
clean across net8/9/10.
CREATE/DROP TEMPLATE -> _index_template (composable index templates) CREATE/DROP COMPONENT -> _component_template (reusable building blocks) CREATE POLICY -> _plugins/_ism/policies (ISM policy definition) APPLY POLICY -> _plugins/_ism/add (attach policy to existing indices) Grammar: - 4 new keywords (TEMPLATE, COMPONENT, POLICY, APPLY) and 6 productions. - Top-level OneOf reordered so CREATE/DROP TEMPLATE/COMPONENT/POLICY take priority over CREATE/DROP INDEX (more-specific second keyword wins). - New indexPattern parser allows '*' for APPLY POLICY's pattern argument. Dispatcher: - DROP TEMPLATE/COMPONENT honor IF EXISTS via HEAD probe. - APPLY POLICY inspects the ISM add response body and surfaces logical failures (updated_indices == 0 or failures: true) as Failed outcomes. ISM returns HTTP 200 even on zero-match, so this is required to avoid false-positive migration records. Resource runner: - ExtractBodyRefName extended for CREATE TEMPLATE/COMPONENT/POLICY. Tests: - 14 new parser unit tests (44 total foundation parser tests pass). - 10 new integration tests against real OpenSearch (Testcontainers 2.18.0). Covers PUT/DELETE round-trips, IF EXISTS skip semantics on absent templates/components, ISM policy create + apply, and the zero-match failure contract for APPLY POLICY. Class is [DoNotParallelize] because ISM operations bootstrap the shared .opendistro-ism-config index on first use and parallel creates race that single-create.
MIGRATE INDEX <old> TO <new> [WITH TEMPLATE <id> | WITH BODY $body] [VIA ALIAS <alias>] [TIMEOUT <duration>] The headline value-add: encodes the canonical zero-downtime reindex-and-swap pattern as one verb. Decomposes at parse time into a CompositeStatementAst whose children are CREATE INDEX + REINDEX + (optional) ALIAS SWAP. The author explicitly names src and dst - no convention is imposed on the data store. AST shapes: - CompositeStatementAst: ordered children, dispatched sequentially, halts on first failure with a per-child detail trail. - TemplateBodyRef: opaque template-name reference carried unresolved through parsing (ADR-0015 keeps the parser offline-pure). - CreateIndexAst: extended with optional TemplateBody field; mutually exclusive with the existing inline Body field. Grammar: - New keywords MIGRATE, VIA. Same-src/dst rejected at parse time (purely syntactic per R-30 Otherwise clause). WITH TEMPLATE and WITH BODY are mutually exclusive (OneOf alternation). Runtime: - TemplateResolutionMiddleware fetches GET /_index_template/<name> and extracts the inner `template` block. Runs in DispatchCreateIndexAsync immediately before the create request is built, so dynamic:strict injection (R-17) and composed_of-aware skipping still apply against the live template body. - Composite dispatch loops children, halts on Failed, returns a combined detail string identifying the halting child for diagnostics. Skipped children (IF [NOT] EXISTS guards) do not halt the chain. Scope notes: - Synchronous REINDEX (Phase 1 path); R-11 async polling + Tasks API is plan task 2.1 and lands as a separate slice. TIMEOUT is parsed for forward-compat but not threaded through here. - R-19 partial-rollback ledger semantics (which child failed for --force-resume) lands in plan task 2.10. Tests: - 8 new parser unit tests (with-template+alias, with-body+alias, no-alias-skips-swap, no-body-default-create, timeout, same-src-dst rejection, case-insensitive). 6 new TemplateResolutionMiddleware unit tests on response-shape extraction (standard, composed_of-only template, empty-array, missing-key, invalid-json, empty-body). - 4 new integration tests against real OpenSearch including the R-24c (o) keystone: composite vs hand-composed end-state equivalence (doc count, mappings, alias resolution all match). 239 unit tests pass (was 226). 4/4 MIGRATE INDEX integration tests pass against Testcontainers OpenSearch 2.18.0.
Two production-correctness fixes that share infrastructure: (1) WHEN VERSION <op> '<version>' <statement> (R-15a) Statement-level prefix that gates child execution on the live cluster's reported version. Closes a real production failure mode: lexical sort treats '2.9' > '2.10' as TRUE, silently inverting a guarded statement on a normal point-release bump. The AST's Evaluate normalizes both sides to .0.0 before comparing so '2.10' = '2.10.0' (R-15a metric). v1 supports MAJOR.MINOR[.PATCH] only. -SNAPSHOT, -rc<N>, and AWS OpenSearch_<x> prefix/suffix forms are rejected at parse time with a remediation message — partial-suffix support is worse than loud rejection in production. The cluster-side version probe tolerates a trailing -SNAPSHOT in the cluster's reported number (deploys do report that) by stripping for comparison. Cluster version is fetched lazily once per dispatcher via Lazy<Task<>> (serializes the first fetch under contention without explicit locking). Skipped statements report the actual cluster version in the detail so ops can distinguish "cluster older than expected" from "predicate is wrong". (2) Component-template-aware dynamic:strict refinement (R-17) Closes the gap MIGRATE INDEX opened: when the source template uses composed_of, the resolved body alone does NOT carry the component mappings (CREATE INDEX with an explicit body bypasses cluster-side template-matching). Injecting dynamic:strict over an incomplete body would surprise authors whose components define their own dynamic behavior. Production templates use composed_of widely. TemplateResolutionMiddleware.ResolveAsync now returns TemplateResolution(Body, HasComposedOf). The dispatcher's CREATE INDEX path uses `record with` to clone the AST with InjectDynamicStrict=false when HasComposedOf is true. Same semantics as the existing inline-body composed_of skip in SafeDefaultMergeMiddleware, lifted to the runtime-resolved path. A WARN log surfaces the gap visibly: the destination index will not inherit component mappings via this path; authors should consider creating the destination by name and letting cluster-side template-matching apply. Tests: - 17 new WHEN VERSION unit tests (parser variants, all six comparators, case-insensitivity, suffix/prefix rejection with remediation, AST evaluation including the load-bearing 2.9 < 2.10 case and patch-level comparisons). - 4 new TemplateResolutionMiddleware unit tests (composed_of-true, composed_of-false, empty-array-treated-as-false, pure-composed_of template with null body). - 5 new WHEN VERSION integration tests (predicate-true dispatches, predicate-false skips, R-15a live 2.9<2.10 against 2.18 cluster, cluster-version cache lifecycle, skip-detail includes cluster version). - 1 new MIGRATE INDEX integration test verifying composed_of detection skips dynamic:strict (writes an unmapped doc post-migrate; passes only if dynamic:strict was correctly skipped). 260 unit tests pass (was 239). 10 OpenSearch integration tests pass against Testcontainers OpenSearch 2.18.0.
…edger
Closes the production-readiness gap surfaced by Slice 2.3's composite
halt: a partial migration leaves the cluster mid-state with no
operator-visible signal. R-19 makes that state explicit, recoverable,
and refuses silent retry.
Down direction (R-19):
- OpenSearchResourceRunner.RollbackStatementsFromAsync(migration, resourceName,
...) parses the per-statement `rollback` field and dispatches in REVERSE
declaration order (LIFO).
- Pre-flight validation: the FULL list is checked for missing `rollback`
fields BEFORE any dispatch. A missing rollback aborts Down with
RollbackNotSupportedException(StatementIndex) and changes nothing.
Otherwise we'd half-roll-back before discovering the next statement
is irreversible.
Partial-rollback ledger (R-19, R-24c (n) keystone):
- When a rollback statement N fails after N+1..M succeeded, the ledger
entry is overwritten with `status: partially_rolled_back`,
`direction: Down`, `failedStatementIndex: N`, and the error message.
- Subsequent ExistsAsync calls on a partially_rolled_back record THROW
OpenSearchPartialRollbackException with a remediation pointing to
ForceResume. The exception bubbles through MigrationRunner.RunAsync
(which only catches MigrationLockUnavailable + OperationCanceled), so
the operator sees the full message and stops.
- ForceResume = true bypasses the lockout for operators who have
manually reconciled cluster state. Surfaces in OpenSearchMigrationOptions;
the runner project (R-26) will expose it as --force-resume when it
lands in plan task 3.4.
Forensic ledger fields (R-06):
- New OpenSearchMigrationRecord extends MigrationRecord with Direction,
Status, AppliedBy, Checksum, Error, FailedStatementIndex.
- Standard WriteAsync(recordId) for successful Up writes now populates
direction=Up, status=succeeded, appliedBy={machine}/{pid}, matching
the strict ledger schema declared by LedgerIndexInitStep.
- Status keyword constants (`succeeded`, `failed`, `partially_rolled_back`)
pinned as public constants on OpenSearchMigrationRecord so writers,
readers, and tests cannot drift.
Best-effort ledger write resilience:
- If WritePartialRollbackAsync itself fails (cluster down, ledger
schema mismatch, etc.), the runner logs at ERROR but DOES NOT mask
the original rollback exception. Two problems are still better
diagnosed visibly than one obscured.
Tests:
- 8 new unit tests covering: rollback validation pass-through, missing
rollback at first/last index, missing-statements-array, empty-JSON,
status-constant pinning, exception accessors.
- 5 new integration tests against real OpenSearch: full rollback in
reverse order succeeds, partial-rollback ledger correctly writes
status=partially_rolled_back + failedStatementIndex (R-24c (n)),
ExistsAsync throws on lockout, ForceResume bypasses lockout, normal
WriteAsync populates direction/status/appliedBy.
268 unit tests pass (was 260; +8). 5/5 R-19 integration tests pass
against Testcontainers OpenSearch 2.18.0.
Two slices in one commit because they're a packaged unit: the runner's
default appsettings.json points Migrations:FromPaths at the samples
assembly, so the runner is unusable without the samples and the samples
are inert without the runner.
Runner (runners/Hyperbee.MigrationRunner.OpenSearch, R-26):
Mirrors the Aerospike/Couchbase/MongoDB/Postgres runner pattern exactly
so operator muscle memory transfers verbatim across providers. Generic
Host + BackgroundService MainService that resolves MigrationRunner from
DI and invokes RunAsync; configuration layered as command-line > env
> appsettings.<ENV>.json > appsettings.json; Serilog with structured
JSON file output for log aggregation.
Switch mappings include the standard --connection / --user /
--password / --ledger / --lock / --lock-name / --profile / --file /
--assembly. Adds:
- --force-resume binds OpenSearchMigrationOptions.ForceResume.
Closes the R-19 UX gap from Slice 2.5: the
partially_rolled_back lockout was previously only
bypassable via internal-API config; ops now have
the on-call-friendly CLI flag the requirement
document called for.
The README documents the recovery procedure end-to-end (inspect ledger
-> reconcile cluster state manually -> re-run with --force-resume) so
operators have a runbook at the same time as the feature.
Samples (runners/samples/Hyperbee.Migrations.OpenSearch.Samples, R-27):
Eight sample migrations covering every v1 verb shipped to date. Each
is self-contained, idempotent against a fresh cluster (CREATE ... IF
NOT EXISTS where idempotence is meaningful), and uses unique sample_*
index names so authors can run the whole suite without conflicts.
1000 CreateInitialIndex CREATE INDEX with body, WAIT FOR
2000 AliasSwapReindexHandComposed long-form reindex-and-swap
3000 ComponentAndIndexTemplate composed_of pattern
4000 IsmPolicyAndApply CREATE POLICY + APPLY POLICY
5000 ConditionalVersion WHEN VERSION semver gating
6000 MigrateIndexComposite FEATURED: R-30 canonical answer to
'how do I propagate template
changes to existing data?'
7000 ReversibleAlias R-19 rollback shape with per-
statement rollback fields
8000 UnsafeReindex REINDEX UNSAFE("...") opt-out idiom
Sample 2 (long form) and sample 6 (MIGRATE INDEX) are paired
intentionally — read together they make explicit what the composite
collapses, and sample 6's README block calls out that contrast for
adopters comparing the two approaches.
Verification:
- Solution builds clean across all projects (warnings are pre-existing
Testcontainers obsolete-API noise, not introduced by this slice).
- 268 unit tests pass.
- Runner end-to-end smoke test: launched against a deliberately-bad
connection string, the runner correctly loads the samples assembly,
resolves the full DI graph, runs the bootstrapper pipeline, and
fails at rest-ping with the unreachable host as the failure detail
(OpenSearchNotReadyException) — proving the full host -> DI ->
bootstrapper chain wires correctly.
Deferred to follow-up slices:
- Authentication beyond basic auth (API key, mTLS, SigV4) — plan tasks
3.1/3.2 still ahead of us.
- BulkAllObservable wrapper — plan task 3.3; sample for bulk-seed
intentionally omitted until that lands.
- NO WAIT("...") modifier — not implemented yet (lands with
WaitMode.PerMigration in plan task 2.9).
Replaces the placeholder README with a comprehensive provider reference
that covers every verb shipped to date. Statement syntax is the
load-bearing section per the user's request; the rest of the document
fills in the surrounding context (DI setup, configuration, lock and
ledger semantics, rollback procedure, production deployment).
Statement-syntax coverage (one section per verb family):
- Index lifecycle: CREATE / DROP / UPDATE MAPPING / UPDATE SETTINGS
[CLOSE] / REFRESH
- Aliases: ALIAS SWAP (R-16 atomic in-body precondition explained),
ALIAS ADD, ALIAS REMOVE
- REINDEX with the UNSAFE("<reason>") opt-out idiom
- MIGRATE INDEX (R-30) - featured: explains the parse-time
decomposition, runtime template resolution, the same-src/dst
parse-time check, and the composed_of-aware dynamic:strict skip
- Templates and components: CREATE/DROP TEMPLATE/COMPONENT
- ISM: CREATE POLICY + APPLY POLICY (with the zero-match logical-failure
contract surfaced)
- Cluster waits: WAIT FOR + WAIT UNTIL TASK
- WHEN VERSION (R-15a) - semver comparison, suffix rejection rationale,
cached cluster-version probe
Surrounding sections:
- Quick start with a working migration class + statements.json
- Body references (R-09) with the sibling-property semantics spelled out
- Rollback (R-19): validation pass + per-statement rollback shape +
partial-rollback ledger semantics + recovery procedure
- Configuration table for OpenSearchMigrationOptions
- Distributed lock + ledger semantics (R-04, R-05, R-06)
- Production deployment pointing at the runner project
- Forbidden-behavior trust boundary as documented in the requirements
Cross-references resolved through ADR / requirement IDs (R-08a, R-09,
R-15a, R-16, R-17, R-19, R-26, R-27, R-30, ADR-0011, ADR-0014, ADR-0015).
… R-21 Adds first-class auth support for the three core modes the provider package owns. SigV4 stays out of this slice deliberately; it ships in the optional OpenSearch.Net.Auth.AwsSigV4 package via a separate opt-in extension (plan task 3.2) so this package keeps the AWS-SDK transitive dependency tree off non-AWS deployments. Provider package: - OpenSearchAuthenticationOptions carries a Mode enum (Anonymous | Basic | ApiKey | ClientCertificate) plus the mode-relevant fields. Validate() runs at client-build time so missing required fields fail at startup with the configuration key to set, not at first wire request. - AddOpenSearchClient(IServiceCollection, Uri, Action<...>?) — the authoritative client-registration extension. Wires the right ConnectionSettings auth method per mode (BasicAuthentication, ApiKeyAuthentication, ClientCertificate). Anonymous mode emits a startup WARN that names the production-ready alternatives. - AddOpenSearchClient(IServiceCollection, IConfiguration) — the config-driven overload the runner uses. Reads OpenSearch:Authentication:* with case-insensitive Mode parsing; preserves back-compat with the legacy flat OpenSearch:UserName / Password (treated as Basic when Mode is unset). - mTLS uses X509CertificateLoader on net9+ with a SYSLIB0057-suppressed X509Certificate2 fallback for net8.0 — both targets work, neither emits a warning on its native API surface. Runner: - StartupExtensions.AddOpenSearchProvider delegates to the new config-driven AddOpenSearchClient extension, removing the manual ConnectionSettings building. - New CLI flags: --auth-mode, --api-key-id, --api-key, --client-cert, --client-cert-password. Existing --user / --password reroute to the new OpenSearch:Authentication:UserName / Password keys. - appsettings.json now declares OpenSearch:Authentication.Mode = "Anonymous" by default with a comment field naming the available modes. Smoke-tested: - ApiKey mode missing fields aborts at startup with the exact config key to set: "Authentication.Mode = ApiKey requires Authentication.ApiKeyId. Set OpenSearch:Authentication:ApiKeyId in configuration." - ApiKey mode with credentials wires through to the live client and the bootstrapper takes over (correctly fails on connect against an unreachable host). - Anonymous mode emits the WARN naming the production alternatives. Tests: - 14 new unit tests covering: Anonymous default; Basic UserName-required; Basic empty-password tolerance; ApiKey both-fields-required; ApiKey remediation message naming user-secrets; ClientCertificate either-or; ClientCertificate path-not-found; ClientCertificate path+instance mutual exclusion; client registration smoke; legacy-flat-keys back-compat; unknown-mode remediation; case-insensitive mode parsing; unknown-enum-value handling. 282 unit tests pass (was 268; +14). Docs updated: provider README has a full Authentication section with the four-mode table, configuration schema, and code samples; runner README has the expanded CLI table.
…on forms (ADR-0017)
Resolves the design smell flagged on review: heterogeneous statements.json
entries (one well-known field plus arbitrary other-named keys interpreted
by the parser) and no graceful path for large or reusable bodies. Three
forms now coexist, ranked by ceremony, with the original ADR-0009 sibling
form preserved as silent back-compat.
Forms:
1. WITH BODY @path/to/file.json - direct file reference
Best for any body large enough to dominate statements.json:
production OpenSearch mappings (200+ lines), ISM policies (100+),
reusable templates. The path loads an embedded resource relative
to the migration's own resource folder. Path validation is
parse-time: absolute paths and `..` traversal rejected.
2. WITH BODY $name + bodies.<name> = inline JSON
Best for tiny bodies tightly coupled to a single statement.
Atomic versioning + single-screen view of the migration. Replaces
form-0 sibling-property as the recommended inline pattern because
the structured `bodies` section is describable to JSON Schema
and tooling.
3. WITH BODY $name + bodies.<name> = "@path/to/file.json"
Less common - addresses bodies by name AND keeps them in their
own files. Useful for clarity in PR review when multiple bodies
in one statement want uniform addressing.
Back-compat: WITH BODY $name resolves to a top-level sibling
property when bodies.<name> is missing. Preserves the
ADR-0009/R-09 shape so existing migrations don't need rewriting;
the fallback is silent (no warning) because the form was the
original documented contract.
Resolution priority: BodyFileRef -> bodies.<name> -> sibling -> throw
with remediation naming both preferred and back-compat forms.
Implementation:
- AST: new abstract BodySource record with BodyRef(Name) and
BodyFileRef(Path) variants. All seven body-bearing AST records
(CreateIndexAst, ReindexAst, UpdateMappingAst, UpdateSettingsAst,
CreateTemplateAst, CreateComponentAst, CreatePolicyAst) carry
BodySource? Body.
- Grammar: bodyRef parser is OneOf(siblingBodyRef, fileBodyRef) with
parse-time path validation in the fileBodyRef callback. Allowed path
characters [a-zA-Z0-9_\-./\]; `..` segments and absolute paths
rejected with remediation messages.
- Resource runner: ResolveBody is the single resolution helper, called
from both RunStatementsFromJsonAsync (Up) and
RollbackStatementsFromJsonAsync (Down). LoadBodyFromResource converts
path separators to embedded-resource dot notation and surfaces
loading failures with the path name in the error.
Sample migrations now demonstrate all three forms:
- Sample 4 (IsmPolicyAndApply) - Form 1: direct WITH BODY @path. The
policy body lives in bodies/hot-warm-cold-policy.json. Demonstrates
the recommended pattern for any production-sized body.
- Sample 3 (ComponentAndIndexTemplate) - mixed Form 3 (bodies.body =
"@bodies/common-mappings-component.json") + Form 2 (inline). Shows
that the structured form can mix file refs with inline values in a
single bodies section.
- Samples 1, 2, 5, 6, 8 - Form 2 inline bodies under the bodies
section. The original sibling-property shape is gone from the
shipped samples but still resolves for any consumers inheriting
pre-3.5 migrations.
Tests:
- 14 new BodySourceParserTests covering: $name parses to BodyRef;
@path parses to BodyFileRef; nested directories OK; backslash
separators accepted (runtime normalizes); applies uniformly across
all body-bearing verbs; absolute paths rejected (Unix and Windows
forms); `..` traversal rejected; filenames with dots NOT mistaken
for traversal; mutual exclusion at the syntax level.
- 5 new OpenSearchBodySourceIntegrationTests against real OpenSearch:
bodies-section inline resolves; ADR-0009 sibling fallback still
resolves; bodies-section beats sibling when both present; missing
body ref throws with remediation naming both forms; missing file
ref throws.
- All 282 prior unit tests continue to pass after the BodyRef ->
BodySource AST migration. Existing test assertions updated from
`Body!.Name` to `Body.Should().BeOfType<BodyRef>().Which.Name`.
296 unit tests pass.
Docs:
- ADR-0017 documents the three forms, resolution order, path
validation rules, and the relation to ADR-0009/R-09.
- Provider README's "Body references" section rewritten to cover
all three forms with side-by-side examples and a "which form to
use" decision table.
- Samples README's verb table now includes a "Body-source form" column
pointing readers at the demonstrating sample for each.
…keystone tests
Surfaces production-correctness behaviors single-node Testcontainers
masks. Covers the four assessment-0002 / R-28b concerns that single-
node fundamentally cannot exercise:
- GREEN-threshold reachability (single-node can never go GREEN since
replicas have nowhere to allocate; WithProductionDefaults() flips
the threshold to GREEN, and that path was previously untestable)
- PA-2 lock-index number_of_replicas:0 invariant (single-node has no
replicas to coupling-with, so the constraint is vacuous; on
multi-node, the cluster would otherwise default to replicas:1 and
the constraint becomes load-bearing under concurrent acquire)
- Replica allocation across distinct nodes (1 primary + 1 replica =
GREEN status only when they land on different nodes; single-node
YELLOWs out with unassigned replicas, exactly the production
failure single-node masks)
- ALIAS SWAP atomicity under concurrent background writes
(R-24c (a)) - alias never on both indices simultaneously even
while a writer pumps documents into the source
Harness:
- MultiNodeOpenSearchTestContainer spins up 3 OpenSearch nodes on a
private Docker network with stable DNS aliases for
discovery.seed_hosts and cluster.initial_master_nodes. Conservative
512MB heap per node (1.5GB total + JVM overhead) to stay within
typical CI runner budgets.
- Opt-in via [ClassInitialize] in the test class (not wired into
assembly-level InitializeTestContainers) so tests that don't need
multi-node pay zero startup cost. The fixture's ~30s cluster
formation is amortized across all tests in the class.
- No per-node HTTP wait strategy. With initial_master_nodes listing
all 3 nodes, none can reach YELLOW until all 3 are up - so a
per-node wait_for_status=yellow strategy deadlocks Testcontainers'
StartAsync on node1 before node2/3 start. The harness skips per-
node strategies (relying on default process-alive readiness) and
does a harness-level WaitForFullClusterAsync that polls
_cluster/health for number_of_nodes==3 once all containers are up.
This was caught during initial validation - 26-minute timeout on
the deadlocked first attempt before the fix.
Tests (4/4 pass in 29s against local Docker after the fix):
Cluster_ReachesGreenStatus_OnceAllNodesJoined
LockIndex_BootstrappedWithReplicasZero_PreventsReplicaWriteCoupling
UserIndex_WithReplicasOne_AllocatesShardsOnMultipleNodes
AliasSwap_DuringBackgroundWrites_AllPreSwapDocsReachable
Tests are tagged [TestCategory("MultiNode")] so CI runners can include
or exclude them as a group:
dotnet test --filter "TestCategory=MultiNode" (only multi-node)
dotnet test --filter "TestCategory!=MultiNode" (skip multi-node)
Documentation:
- MULTINODE.md alongside the harness explains when to use it,
lifecycle wiring, resource cost, and the per-node-wait-strategy
pitfall so future test authors don't re-discover it.
Out of scope for this slice (deferred to plan task 3.6 / 2.12):
- Multi-node CI integration (this slice ships the harness, not the
CI workflow that runs it on every PR)
- The full R-24c 15-test production scenario suite (this slice
ships 4 keystone tests; 2.12 expands to the full 15)
… + ISM capability detection (R-21) Closes the AWS Managed OpenSearch deployment story per R-21. Three threads in this slice, each addressing a distinct R-21 sub-clause: R-21 #1 (SigV4 in optional package) R-21 #2 (AWS endpoint loud-fail in core) R-21 #3 (ISM endpoint capability detection) R-21 #4 (per-request credential resolution) Architectural shape (option-E from the design discussion): Two completely separate registration paths, split by what the auth mode actually does to the HTTP layer: - Core's services.AddOpenSearchClient handles header-based auth (Basic, ApiKey, ClientCertificate, Anonymous) — all of which set credentials on ConnectionSettings without changing the HTTP transport. - The new Hyperbee.Migrations.Providers.OpenSearch.Aws extension's services.AddOpenSearchAwsClient handles SigV4 — which REPLACES the HTTP transport with AwsSigV4HttpConnection that signs every request with AWS-fresh credentials. The boundary follows the actual technical seam, not arbitrary categorization. Each path's validation is local: no DI introspection across packages, no shared markers, no implicit override semantics. The two are mutually exclusive — calling both throws with a remediation message naming the alternative. R-21 #1 — AWS extension package (new) src/Hyperbee.Migrations.Providers.OpenSearch.Aws: - OpenSearchAwsAuthenticationOptions: Region (required, validated against AWSSDK known-region list at registration time so typos like us-east1 fail fast); Service ("es" default, "aoss" for Serverless); Credentials (default chain via FallbackCredentialsFactory unless set explicitly). - AddOpenSearchAwsClient(IServiceCollection, Uri, Action<...>) and IConfiguration overload. - Builds AwsSigV4HttpConnection, attaches to ConnectionSettings, registers IOpenSearchClient as singleton. - Throws if an IOpenSearchClient is already registered (mutual exclusion guard). - WARNs at client-build time if endpoint isn't *.amazonaws.com (the inverse-mismatch case — usually a misconfiguration but legitimate for sigv4-compatible proxies and custom-domain fronting). R-21 #2 — AWS endpoint loud-fail in core ServiceCollectionExtensions.AddOpenSearchClient gains two pre-build guards: ThrowIfAwsEndpoint - pure URL string check; if Host EndsWith ".amazonaws.com" (case-insensitive), throws AwsSigV4NotConfiguredException with the EXACT services.AddOpenSearchAwsClient(...) snippet to add. No DI introspection, no marker dance, no cross-package conditional flow — just a string suffix match against a typed exception. Substring-match attacks like amazonaws.com.attacker.test correctly resolve to non-AWS (the EndsWith check covers this). ThrowIfClientAlreadyRegistered - mutual exclusion with the AWS extension, symmetric with the AWS extension's own guard. R-21 #4 — Per-request credential resolution AwsSigV4HttpConnection calls AWSCredentials.GetCredentials() per request internally. With FallbackCredentialsFactory or any of the standard implementations (InstanceProfile, ECS, IRSA), credentials re-resolve per request — IRSA and instance-profile rotation work without runner restart. No client-construction-time caching. No extra plumbing required at the provider layer; the AWSSDK design already does what R-21 #4 wants. R-21 #3 — ISM endpoint capability detection Modern OpenSearch exposes ISM under /_plugins/_ism/...; older AWS Managed domains expose it under /_opendistro/_ism/.... The dispatcher cannot hard-code either path without breaking deployments using the other. IsmEndpointCapability (Internal): singleton service holding the resolved prefix. SetPrefix is idempotent for the same value but throws if asked to re-set with a different value (signals a bootstrap-logic bug). IsmEndpointDetectStep (Internal/Bootstrap/Steps): probes the modern path first via GET /_plugins/_ism/policies. On 404, retries the legacy /_opendistro/_ism/policies. On any non-404 failure (network, auth, 5xx), surfaces the failure as Failed bootstrap so the operator sees actual cluster issues rather than a silent fallback. On both probes failing, the remediation names the required IAM action for AWS Managed (es:ESHttp* against the ISM resource ARN). StatementDispatcher consults IsmEndpointCapability for the CREATE POLICY and APPLY POLICY paths. When unresolved (e.g., a test that bypasses bootstrap), falls back to the modern prefix so non-AWS single-node tests work without explicit setup. Tests: - 13 new unit tests for the AWS registration surface (URL guard fires on *.amazonaws.com, doesn't fire on substring matches in the middle of a host, mutual exclusion in both directions, region validation rejects typos at registration time, IConfiguration overload reads keys, etc.). - 7 new unit tests for IsmEndpointCapability semantics (default unresolved, idempotent re-set, divergent re-set throws, constants pinned). - 1 new integration test confirming IsmEndpointDetectStep resolves to the modern prefix against the OpenSearch 2.18 Testcontainers image; the existing 10 OpenSearchTemplatePolicyIntegrationTests continue to pass with ISM detection wired through, proving CREATE POLICY and APPLY POLICY use the resolved path correctly. 316 unit tests pass (was 296; +20 net). Solution builds clean across all targets. Docs: - src/Hyperbee.Migrations.Providers.OpenSearch.Aws/README.md spelled out: install, usage, mutual exclusion, credential resolution per R-21 #4, AWS endpoint loud-fail, service codes (es vs aoss), region validation. - Provider README's Authentication section now lists 5 modes across 2 packages with the technical-seam rationale, points at the AWS extension README for SigV4, and explains the mutual-exclusion guards and the URL-guard remediation flow. Deferred to a follow-up slice (3.2 was already wide): - Multi-node integration test that spins up a 3-node cluster and verifies SigV4 against a real AWS Managed domain — that needs an actual AWS account and is the subject of R-28c (scheduled validation runbook, plan task 3.7).
…R-28b)
R-28b mandates multi-node CI as Must, not Should: the four production
behaviors single-node fundamentally masks (GREEN-threshold, replica
allocation, shard relocation under load, PA-2 lock-index replicas:0
invariant) need to be exercised on every PR or they regress silently.
Workflow (.github/workflows/multi_node_tests.yml):
- Triggers on PR + workflow_dispatch.
- Runs on ubuntu-latest (Docker available by default).
- concurrency: cancels in-flight runs on the same ref so rapid pushes
don't pile up 90-second cluster-formation runs.
- Builds the integration tests assembly with -p:EnableIntegrationTests=true.
- Runs `dotnet test --filter "TestCategory=MultiNode"` so only the
4 MultiNode-tagged tests fire — other tests in the assembly stay
off (they require providers we don't initialize on this run).
- Sets HYPERBEE_TESTS_SKIP_SINGLE_NODE=true in env so the assembly-level
InitializeTestContainers becomes a no-op for single-node providers.
The MultiNode test class's own [ClassInitialize] handles the 3-node
cluster setup. Net cost: 3 OpenSearch containers, no Mongo / Postgres
/ Couchbase / Aerospike / single-node OpenSearch.
- Uploads the .trx test result artifact for every run.
Property-driven INTEGRATIONS gate
(tests/.../Hyperbee.Migrations.Integration.Tests.csproj):
The integration tests use `#if INTEGRATIONS` at the file level so a
plain `dotnet test` skips them. The new <DefineConstants> conditional
appends INTEGRATIONS to the compiler's symbol set when
EnableIntegrationTests=true is passed:
<DefineConstants Condition="'$(EnableIntegrationTests)' == 'true'">
$(DefineConstants);INTEGRATIONS
</DefineConstants>
This keeps the source-level `//#define INTEGRATIONS` pattern working
for local iteration (uncomment to run a single test class) while
giving CI a property-driven way to flip the symbol without touching
source. CI is reproducible without per-file edits; local-dev workflow
unchanged.
Per-provider opt-out for single-node assembly init:
InitializeTestContainers.Initialize now early-returns when
HYPERBEE_TESTS_SKIP_SINGLE_NODE=true. Default behavior unchanged
(env var unset → all 5 single-node providers spin up as before).
This is the simplest way to bypass the assembly-level container
startup without restructuring the provider-agnostic
[AssemblyInitialize] contract.
Local-dev verification: `dotnet build -p:EnableIntegrationTests=true`
succeeds across all targets (net8/net9/net10), confirming the
property-driven define flips correctly. The actual 4/4 test
correctness was validated in commit 8d9b5b2 (Slice 2.11) against
local Docker; this commit only adds the CI plumbing around them.
…e runner ActiveContext + ContextResolutionPolicy were declared on OpenSearchMigrationOptions in earlier slices but never consumed. R-15 specifies the wiring at the resource-file level, gated through ContextResolutionPolicy semantics that fail loud in production. Wiring: - OpenSearchResourceRunner.RunStatementsFromJsonAsync and RollbackStatementsFromJsonAsync both gate on ShouldRunForActiveContext(root) before any work happens. Skipped files return cleanly with an INFO log naming the file's contexts and the active runtime context. No statements dispatch, no ledger writes, no rollbacks. - The gate reads an optional top-level `context: [...]` array on the statements.json wrapper. No context block = always run (the lazy path stays unaffected). Empty array = also always run (degenerate case must not lock everyone out). - ActiveContext is comma-separated (e.g., "canary,prod") so a single runner can claim membership in multiple contexts. Matching is case-sensitive — context tags are identifiers, not free-form text. Any-tag-intersects = run. - Under ContextResolutionPolicy.RequireExplicit (the production default set by WithProductionDefaults), file-has-context AND ActiveContext-null throws MissingActiveContextException (new typed exception in OpenSearchExceptions.cs) with the configuration key to set. Trust boundary forbids silent prod-everywhere; the only legal outcomes when context is declared are run-because-matched, skip- because-mismatched, or fail-because-unset. RunIfUnset is intentionally not exposed. - Under SkipIfUnset (SDK default), ActiveContext-null produces a silent skip with INFO log so dev iteration is friction-free. Tests: - 9 new OpenSearchContextFilterTests covering the full table: no context block, single-tag match, comma-separated match, mismatch (silent skip), case-sensitive non-match, ActiveContext-null under both policies (skip vs throw), empty `context: []` is degenerate (no lockout), rollback path uses the same gate. - 325 unit tests pass (was 316; +9 new). Docs: - Provider README's Statement-syntax section gains a "Context filter (R-15)" subsection with the resolution table and explicit note that WithProductionDefaults() flips to RequireExplicit. Combine with WHEN VERSION for statement-level gating inside an admitted file.
OpenSearchResourceRunner.BulkLoadAsync<T>(indexName, documents, options) wraps OpenSearch.Client's BulkAllObservable with the R-20 production- safe defaults and surfaces retried 429s as structured WARN logs. Defaults (BulkLoadOptions, all overridable): BatchSize 1000 docs (~5MB at typical shapes) MaxDegreeOfParallelism 8 BackOffRetries 5 InitialBackOff 1s (-> 2s -> 4s -> 8s -> 16s) RefreshOnCompleted true (single _refresh at end) Per-batch refresh stays off — refreshing per request under 8x parallelism is the documented anti-pattern that triggers segment-merge storms (PA-6 from assessment 0002). Implementation notes: - BulkAllObservable is reactive; the helper subscribes via a small inline IObserver<BulkAllResponse> wrapper rather than pulling in System.Reactive for one method. OnNext logs WARN for any page whose response.Retries > 0; OnCompleted resolves the TaskCompletionSource that the await chain hangs on; OnError rethrows the exception through the same TCS. - ContinueAfterDroppedDocuments(false): bulk operations failing permanently after the retry budget should surface as the migration failing, not as silent partial success that breaks downstream reads. - R-20 spec calls for "5MB batches" but BulkAllDescriptor.Size is a document count, not a byte size. The default value targets approximately 5MB at typical document shapes; authors with very large or very small documents override BatchSize explicitly. Tests: - 2 new BulkLoadOptionsTests pinning the R-20 spec values (BatchSize=1000, parallelism=8, retries=5, backoff=1s, RefreshOnCompleted=true) AND verifying every option is genuinely settable (R-20: "All defaults are overridable via options"). - Live-cluster bulk-load semantics belong in the integration tests; this slice ships the in-process default-pinning tests. 327 unit tests pass (was 325; +2). Docs: - Provider README gains a "Bulk document loading (R-20)" section with usage example, options table, and the segment-merge-storm rationale for why per-batch refresh stays off.
… (R-12)
R-12 was partially shipped: WaitMode.PerStatement (default) and Off
were honored, but PerMigration was a no-op stub with a "Phase 6
deferred" comment. The NO WAIT("<reason>") modifier wasn't implemented
at all. This slice closes both gaps.
PerMigration tracking + flush:
- StatementDispatcher gains a HashSet<string> _dirtyIndices field that
accumulates mutated index names across statements. Under PerMigration
the per-statement implicit wait records the index and returns
immediately; the resource runner calls dispatcher.FlushImplicitWaitsAsync
at end of resource pass for a single consolidated _cluster/health
call across all dirty indices. PerStatement and Off paths are
unchanged.
- Both up (RunStatementsFromJsonAsync) and down (RollbackStatementsFromJsonAsync)
call FlushImplicitWaitsAsync at the end. Down is symmetric because
rollback statements (CREATE / DROP / REINDEX / ALIAS SWAP) are
themselves mutating.
- Sequential dispatch within a resource runner means HashSet without
locking is correct.
NO WAIT("<reason>") modifier:
- Grammar — new `noWaitWithJustification` parser fragment shared
alongside the existing UNSAFE one (both reuse `quotedString` which
rejects empty/whitespace-only). Wired into all five mutating verbs
per R-12: CREATE INDEX, REINDEX, ALIAS SWAP, UPDATE SETTINGS,
APPLY POLICY. Modifier is the trailing clause so it never conflicts
with WITH BODY / VIA ALIAS / etc.
- AST — five mutating records gain an optional NoWaitJustification
string field. Records use parameterless defaults so existing call
sites in tests (and the MIGRATE INDEX expansion grammar) continue
to compile without changes.
- Dispatcher — ImplicitWaitIfMutatingAsync now takes (verb, justification)
and emits a structured WARN log under PerStatement when a
justification is present (the `migration.no_wait{reason, idx, verb}`
spec event). Under PerMigration the per-statement wait is already a
no-op until the end-of-migration flush, so NO WAIT degrades to a
DEBUG-level acknowledgement on that path.
- ApplyPolicy now also participates in the implicit wait per R-12's
enumeration; previously the dispatcher omitted it.
Tests:
- 7 new NoWaitParserTests covering the modifier shape on each of the
five mutating verbs plus a stacking test (REINDEX UNSAFE + NO WAIT
together — they're independent opt-outs of different safe-defaults
and capture cleanly into separate AST fields).
- 4 spec'd parse-time-rejection tests (bare NO WAIT, empty
justification, whitespace-only, DROP-INDEX-doesn't-accept) are
blocked on a wider parser-hygiene issue (Parlot's TryParse doesn't
anchor to EOF; trailing tokens after a successful prefix-match are
silently dropped). Tracked as a known limitation in a code comment;
fixing it requires `.Eof()` on the top-level OneOf which affects
every verb's accept criteria — separate hardening slice.
- 334 unit tests pass (was 327; +7).
Docs:
- Provider README's Cluster-waits section gains a "WaitMode and the
NO WAIT modifier (R-12)" subsection with the three-mode table and
the bare-NO-WAIT-fails-at-parse-time spec note.
Maintainer review on Slice 3.5: the `bodies/` subfolder was sample-style choice, not a grammar requirement. The resolver accepts any relative path under the migration's resource folder — `@foo.json`, `@bodies/foo.json`, `@configs/v2/foo.json` are all equally valid. Imposing a folder convention via the samples implies a constraint that doesn't exist. Sample 4 (single body) flattened: hot-warm-cold-policy.json now lives at the migration root and the statement reads `CREATE POLICY ... WITH BODY @hot-warm-cold-policy.json`. Demonstrates that the simplest path works without ceremony. Sample 3 (multiple bodies) keeps `bodies/` because grouping is the legitimate case for a subfolder when a single migration has more than one body file. Provider README's "Form 1" example updated to use a flat path (`@users-mapping.json`) and a new sentence makes the policy explicit: "Subfolders are optional. ... Group bodies into subfolders when a single migration has many of them; otherwise leave them flat at the migration root." The `bodies` keyword in the JSON wrapper stays — keyword/section name mirror is the cognitive payoff of the design (author writes `WITH BODY $foo` and looks up `bodies.foo`); replacing it with `data` or `content` would decouple the vocabulary for negligible benefit. No grammar changes. Samples csproj's EmbeddedResource path updated to match the flattened layout. No tests affected.
…/i/k/m)
Closes the R-24c production-scenario suite gaps that earlier slices
hadn't covered. R-24c is the "production-capable" gate per the
requirements doc; six scenarios remained:
(c) Mapping update on existing index produces "no reindex" gotcha
diagnostic
(d) Static settings update fails clearly without CLOSE, succeeds with
it
(g) dynamic:strict rejects unmapped fields with the documented error
(i) Reindex op_type:create skips partial-prior-run docs (no double-
write after a crashed prior run)
(k) Lock primary-shard contention on multi-node — N concurrent
acquires, one winner, bounded tail latency under PA-2 replicas:0
(m) Ledger refresh budget at scale — 100 writes complete within
budget on multi-node
(a)/(b)/(h)/(j)/(n)/(o) covered by earlier slices; (e) defers to plan
task 2.1 (Tasks API); (f) defers (toxiproxy infrastructure); (l)
REMOVED per ADR-0016.
R-24c (c) — UPDATE MAPPING diagnostic
Adds an INFO-level log to DispatchUpdateMappingAsync naming the
"mapping changes don't reindex existing data" gotcha and pointing
at MIGRATE INDEX (R-30) as the canonical propagation pattern. The
diagnostic surfaces the silent-wrong-state class without blocking the
operation; the test pins its presence so a refactor that drops the
log fails the gate.
Tests:
OpenSearchR24cGapFillIntegrationTests — 5 single-node scenarios (c, d
once for the failure path + once for the CLOSE-succeeds path, g, i),
all single-node Testcontainers.
OpenSearchR24cMultiNodeIntegrationTests — 2 multi-node scenarios (k
concurrent-lock-acquire with bounded tail-latency assertion, m
100-migration ledger-write budget at 60s). [TestCategory("MultiNode")]
so the existing multi_node_tests.yml CI workflow picks them up
alongside the 4 keystone tests from Slice 2.11.
All tests use [TestCategory("R-24c")] so the production-capable suite
can be filtered and reported as a unit. Integration tests stay gated
behind the EnableIntegrationTests MSBuild property; CI activates them
on PRs.
Build clean across all targets. 334 unit tests still pass (no unit-
test changes in this slice; all R-24c work is integration-tier).
… runbook
R-28c calls for a runbook covering AWS-specific behaviors that
single-node and 3-node Testcontainers fundamentally cannot exercise:
SigV4 request signing, the AWS endpoint loud-fail at startup, ISM
endpoint capability detection against real AWS domains (which
historically have both modern `/_plugins/_ism` and legacy
`/_opendistro/_ism` surfaces depending on age), and IRSA / instance-
profile credential rotation across long-running migrations.
docs/runbooks/opensearch-aws-validation.md:
- Prerequisites: domain choice, IAM permissions naming the exact
`es:ESHttp*` actions required, credential resolution chain.
- Runner configuration showing AwsSigV4 mode in appsettings shape.
- Four validation steps:
(1) Loud-fail negative test — pointing core's AddOpenSearchClient at
an *.amazonaws.com endpoint without the .Aws extension. Pass
criterion: AwsSigV4NotConfiguredException at startup with the
exact AddOpenSearchAwsClient remediation snippet.
(2) Smoke test — runs all 8 samples against the AWS domain;
verifies ledger forensic fields (R-06) populated correctly,
including appliedBy for credential-identity confirmation.
(3) ISM endpoint detection — examines bootstrapper's log for the
ism-detect resolution line. Documents the exact remediation
(IAM action) when neither prefix probe succeeds.
(4) Credential rotation (optional, long-running) — exercises R-21
#4 per-request credential resolution by running >1 hour with
IRSA / instance-profile credentials.
- Reporting protocol: every release MUST add either a PASS or DEFERRED
line to the release checklist. Silent skipping is forbidden by the
process.
- Failure-mode triage section pointing each step's failure at the
likely cause and the code path to investigate.
- Out-of-scope explicitly: full CI automation of the runbook (v1.1
per requirements doc Open Questions); ISM step against
OpenSearch Serverless (Serverless doesn't expose ISM); cross-region
failover.
docs/runbooks/INDEX.md:
- New top-level index for the runbooks subtree, matching the docs/
convention used elsewhere (decisions/INDEX.md, etc.).
…e-propagation FAQ
Brings the public docs site and the top-level repo README in line with
the OpenSearch provider that's been shipped over Phase 1-3. R-27
explicitly calls for the template-propagation FAQ "featured prominently
in the README as the answer to 'how do I apply template changes to
existing data?'"; this slice delivers it.
Top-level repo README:
- Supported-providers list now includes OpenSearch.
- Resource-migrations bullet mentions OpenSearch DDL alongside SQL /
N1QL / AQL / MongoDB commands.
docs/site/index.md:
- Same supported-providers correction.
docs/site/getting-started.md:
- Install command list adds the OpenSearch provider package.
- Notes the optional .Aws extension for AWS Managed OpenSearch.
docs/site/opensearch.md (new):
- Mirrors the existing per-provider page shape (couchbase.md /
postgresql.md / etc.) but tailored to OpenSearch's distinctives:
the two registration paths (mutually exclusive: AddOpenSearchClient
for Basic/ApiKey/mTLS/Anonymous OR AddOpenSearchAwsClient for
SigV4); options table with the full surface; statement-grammar
pointer at the package README for the deep reference; MIGRATE INDEX
as the headline mapping-propagation pattern; lock semantics with
PA-2 replicas:0 rationale; ledger forensic fields per R-06; R-19
partial-rollback recovery via --force-resume; multi-topology
testing pointers (single-node CI, multi-node CI per R-28b, AWS
Managed scheduled validation per R-28c).
docs/site/opensearch-template-propagation-faq.md (new):
- The featured FAQ R-27 calls for. Walks through:
- Why mapping/template changes don't propagate (the OpenSearch
indexing model)
- The canonical answer: MIGRATE INDEX <old> TO <new> WITH TEMPLATE
<id> VIA ALIAS <alias>
- Step-by-step before/during/after walkthrough of the composite
- Common variations (inline body vs template; without alias swap;
write-during-migration considerations)
- When UPDATE MAPPING is sufficient (additive only) vs when reindex
is required (type changes, removals, analyzer changes,
dynamic-mapping changes to historic data)
- Why MIGRATE INDEX over hand-composing CREATE+REINDEX+ALIAS SWAP
(safe defaults baked in, atomicity explicit, intent readable,
template resolution offline-pure per ADR-0015)
- Cross-links to opensearch.md, resource-migrations.md, concepts.md,
and the working sample 6.
ASCII-only verified per the docs/site/*.md just-the-docs constraint.
Cross-cutting audit per phase DoD item "ADRs touched by this phase verified against acceptance criteria" (B1 / NF-5). For each Accepted ADR, locates the implementing code path and the verifying test or doc artifact. Result: 17/17 honored. Three soft spots noted, none blocking: - ADR-0012 (WithProductionDefaults): marker registration only; options-factory wiring deferred per ADR's own consequences. - ADR-0009 (Convention-Based Record IDs): verified indirectly through ledger-bearing tests rather than a focused unit test. - ADR-0016 (No File-Level Templating): verified through absence (no Hyperbee.Templating reference in csproj). Release-readiness gate: PASS. Plan Status Summary updated to reflect Phase 0/1/2/3 all Done.
All 4 phases delivered. ADR compliance audit (0001-0017) PASS. Plan moved to docs/plans/archive/2026-05-opensearch-provider.md. Build clean across net8/9/10; 334 unit tests pass (1,002 executions, 0 failures).
Three hardening items from the ADR compliance audit follow-ups:
1. EOF-anchor the OpenSearch statement parser
Apply .Eof() to the top-level Parlot parser so trailing tokens after
a successful prefix-match are reported as parse errors rather than
silently dropped. Restores the four NO WAIT parse-time-rejection
tests previously deferred:
- bare NO WAIT (no parens, no justification)
- NO WAIT("") with empty justification
- NO WAIT(" ") with whitespace-only justification
- DROP INDEX ... NO WAIT (NO WAIT not permitted on non-mutating verbs)
Wraps grammar-level InvalidOperationException (from quotedString
non-empty validation, ParseVersionLiteral, etc.) into
OpenSearchParseException so callers handle one exception type.
2. ADR-0009 focused convention test
New DefaultMigrationConventionsTests asserts the documented record-id
format (record.<version>.<kebab-cased-name>), tightening the
regression net beyond indirect ledger-bearing test coverage.
3. ADR-0016 dependency-scan test
New OpenSearchProviderDependencyTests asserts the OpenSearch provider
assembly does not reference Hyperbee.Templating. If a future
contributor adds the package, CI fails before merge.
Verification: 343 unit tests pass on net8/9/10 (1,029 executions, 0
failures). Build clean, no new warnings.
ADR-0012 — WithProductionDefaults() is now a behavioral forcing function, not just a marker. The OpenSearchMigrationOptions factory checks for the UseProductionDefaultsMarker singleton and, when present, flips: - ClusterHealthThreshold = Green - WaitMode = PerMigration - RequireUnsafeJustification = true - ContextResolutionPolicy = RequireExplicit BEFORE invoking the user's configuration callback, so explicit per-option settings still win. Coverage: WithProductionDefaultsTests (3 tests). R-24c (f) — bulk-load 429 retry surfacing. The OpenSearch.Net library owns the retry mechanism; the provider's BulkAllObserver owns the WARN log when response.Retries > 0. BulkAllObserverRetryTests drives the observer with synthetic BulkAllResponses (4 tests). Joint cluster-level chaos validation added as Step 4 of the AWS validation runbook. Audit doc updated: all original soft spots are now closed.
…' into devs/bfarmer/provider-opensearch
OpenSearch site doc now includes per-verb reference for every v1
statement (CREATE/DROP INDEX, UPDATE MAPPING/SETTINGS, REFRESH, ALIAS
SWAP/ADD/REMOVE, REINDEX, MIGRATE INDEX, CREATE/DROP TEMPLATE,
CREATE/DROP COMPONENT, CREATE/APPLY POLICY, WAIT FOR, WAIT UNTIL TASK,
WHEN VERSION) with worked JSON examples, the three body-source
resolution forms, NO WAIT/UNSAFE justification semantics, the context
filter, rollback, and bulk-loading. Provider options table and
WithProductionDefaults table are now self-contained on the site (no
longer redirects to the package README).
Aerospike site doc expanded from a single CREATE INDEX example to
a full statement reference: CREATE INDEX with all flags
(IF NOT EXISTS / RECREATE / WAIT / index types), DROP INDEX,
CREATE SET (intent-only), INSERT/DELETE (intent-only with pointer
to DocumentsFromAsync / IAsyncClient). Resource layout, csproj
EmbeddedResource pattern, and seed-document conventions documented.
Verified ASCII-only across docs/site/*.{md,html,yml,yaml}.
…ing shallow-clone error)
WaitForFullClusterAsync now waits for status=green (not just 3 nodes joined) and uses a 180s deadline. Three-nodes-joined isn't a stable signal: replicas may still be allocating, which is exactly when an immediate REINDEX gets a connection reset (the AliasSwap failure mode seen on shared GitHub runners). The 60s deadline was tuned for local Docker (10-20s typical) and was too tight on CI (image pull + JVM warm-up + election push past 60s under runner load).
Three OpenSearch JVMs on shared ubuntu-latest hits resource pressure (connection resets mid-operation; second test class fails to bring its cluster up after the first tears down). Tests pass locally and the harness changes from this branch (180s deadline, wait-for-green) remain in place. Nightly run catches regressions without gating PRs while we work out the shared-runner stability issues.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan