-
Notifications
You must be signed in to change notification settings - Fork 1
docs: ADR bundle (0001-0008) — engine and CoW architectural decisions #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
brunota20
wants to merge
5
commits into
nullislabs:main
Choose a base branch
from
bleu:docs/adr-bundle
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
8af08bd
docs(adr): add 0001-0007 capturing engine and CoW architecture decisions
brunota20 3f1dbf8
docs(adr): unwrap hard-wrapped paragraphs to single line each
brunota20 7e05190
docs(adr): revise CoW design and reorder ADRs (0001-0008)
brunota20 67a0be7
fix(docs): reviewed ADRs by bleu
brunota20 e5579a3
fix(docs): revised ADRs and diagrams
brunota20 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| --- | ||
| status: proposed | ||
| implemented-in: nullislabs/shepherd#8, nullislabs/shepherd#9 | ||
| --- | ||
|
|
||
| # Operator config (`engine.toml`) is separate from module manifest (`module.toml`) | ||
|
|
||
| ## Context | ||
|
|
||
| The engine needs two distinct kinds of configuration: what the **operator** decides at deployment time (which chains to connect to, where the local-store database lives, which modules to boot) and what the **module developer** declares at build time (required and optional capabilities, HTTP allowlist, module-specific config keys). These have different reviewers, different threat models, and change on different cadences. | ||
|
|
||
| The filenames need to signal who owns each file directly. An operator opening a config file should know without prior context whether the file is their concern or the module developer's. A name like `nexum.toml` requires the reader to know that "nexum" refers to the runtime that hosts the module, which is one indirection too many; `module.toml` reads as "the module's manifest" with no prior context. | ||
|
|
||
| ## Decision | ||
|
|
||
| Two distinct files, distinct schemas, distinct loaders: | ||
|
|
||
| - **`engine.toml`** — operator-owned, lives next to the engine binary or pointed to by `--engine-config`. Defines `[engine]` (state_dir, log_level), `[chains.<id>]` (rpc_url), and `[[modules]]` (path, manifest). Loaded by `engine_config::EngineConfig::load`. | ||
| - **`module.toml`** — module-developer-owned, ships in the module's bundle alongside its `.wasm` component. Defines `[module]`, `[capabilities]` (required, optional, http allowlist), `[config]`. Loaded by `manifest::load`. | ||
|
|
||
| The engine config carries the path to each module's manifest; the two never collapse into one file. The names `engine.toml` and `module.toml` map directly onto the two distinct roles, so a reader reaching either file knows whose concerns it covers. | ||
|
|
||
| ## Considered options | ||
|
|
||
| - **Single `shepherd.toml` with `[engine]`, `[chains]`, `[[modules]]` *and* nested `[modules.<n>.capabilities]` per module.** Rejected: conflates operator and developer concerns. A module's capability declaration is a property of the build, not the deployment — it belongs in the artifact, not in the operator's local file. Auditing a module's capabilities also becomes a per-deployment exercise instead of a property visible in the published bundle. | ||
| - **Keep the `nexum.toml` filename for the module manifest.** Rejected: the name does not signal who owns the file (engine vs module). `module.toml` reads as "the module's manifest" without prior context. | ||
| - **`module.toml` inside the engine config (module entries embed it inline).** Rejected for the same reason as the single-file proposal; also bloats `engine.toml`. | ||
| - **Drop `engine.toml` entirely; pass everything as CLI flags or env vars.** Rejected: per-chain RPC URLs and module lists are awkward as flags, and `RUST_LOG` already covers the only thing that env vars naturally express. | ||
|
|
||
| ## Consequences | ||
|
|
||
| - A deployment needs both files. A missing `engine.toml` falls back to "no chains, default state_dir" — the example logging module still runs; cow-api / chain backends report `unsupported`. | ||
| - A missing `module.toml` triggers the 0.1-compat deprecation warning in `manifest::fallback_manifest()` (defined in `crates/nexum-engine/src/manifest.rs`) and treats every linked capability as required. This fallback is scheduled for removal in 0.3 per `docs/migration/0.1-to-0.2.md`. | ||
| - Module-bundle redistribution carries `module.toml` with the artifact; engines do not need to ship templates. | ||
| - Future content-addressed module distribution (0.3) embeds `module.toml` in the bundle hash; `engine.toml` references modules by content address rather than filesystem path. The split survives that migration unchanged. | ||
| - Implementation impact: `crates/nexum-engine/src/manifest.rs` and `engine_config.rs` need to update the filename lookup from `nexum.toml` to `module.toml`. The 0.1-compat fallback in `manifest::fallback_manifest()` should accept both names during the transition; after 0.3 only `module.toml` is recognised. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| --- | ||
| status: proposed | ||
| implemented-in: nullislabs/shepherd#8, nullislabs/shepherd#9 | ||
| --- | ||
|
|
||
| # Per-chain alloy provider transport selected by URL scheme | ||
|
|
||
| ## Context | ||
|
|
||
| `nexum:host/chain` covers both generic JSON-RPC dispatch (`request`) and event subscriptions (`subscribe-blocks`, `subscribe-logs`). Subscriptions require a duplex transport (`eth_subscribe` is push-only over a long-lived connection); request/response works on either HTTP or WebSocket. The operator configures one RPC endpoint per chain in `engine.toml`; the engine has to decide which alloy transport to use. | ||
|
|
||
| ## Decision | ||
|
|
||
| The `ProviderPool::from_config` constructor reads each chain's `rpc_url` and switches by URL scheme prefix: | ||
|
|
||
| - `ws://` or `wss://` → `ProviderBuilder::new().connect_ws(WsConnect::new(url))`. Pubsub transport. Subscriptions and request/response both work. **This is the recommended configuration for any chain a module subscribes to.** | ||
| - `http://` or `https://` → `ProviderBuilder::new().connect_http(parsed)`. HTTP transport. Request/response only; `subscribe-blocks` and `subscribe-logs` surface as `host-error.unsupported` to the guest. | ||
|
|
||
| Both transports erase to `DynProvider` so the rest of the engine is transport-agnostic. | ||
|
|
||
| Alloy is capable of emulating `eth_subscribe` on HTTP via polling, but this is intentionally **not** enabled. The engine takes an opinionated stance favouring WebSockets for subscriptions; operators who want push-based events configure WSS endpoints. HTTP-only chains are supported for `request` traffic but not for subscriptions. | ||
|
|
||
| ## Non-goals | ||
|
|
||
| - **RPC failover, load balancing, and retry policies are explicitly out of scope for the engine.** This logic lives in upstream crates (alloy ships tower-style middleware for timeout / retry / rate-limit / fallback endpoint). The engine does not roll its own. Operators wanting failover configure it via alloy provider builders before passing them through, or rely on the provider's own fallback (Alchemy, Infura, etc. handle it server-side). | ||
| - Re-routing requests across chains, rebalancing across pools within a chain, and similar provider-management concerns are likewise alloy's responsibility. | ||
|
|
||
| ## Considered options | ||
|
|
||
| - **Force WSS everywhere.** Rejected: many providers (Alchemy, Infura, self-hosted RPC) expose HTTP-only on free tiers, and modules that only need `request` (no subscriptions) shouldn't be blocked by a WSS requirement. | ||
| - **Explicit `transport = "ws" | "http"` field per chain in `engine.toml`.** Rejected for 0.2: redundant with the URL scheme, and operators already distinguish `wss://` from `https://` endpoints when copying them from their RPC provider's dashboard. Revisit if we add IPC (`/path/to/geth.ipc`) — scheme alone won't carry that. | ||
| - **Open both an HTTP and a WSS connection per chain.** Rejected: doubles connection count for the common case where one endpoint serves both, and forces operators to provide two URLs even when their provider returns identical data on both. | ||
|
|
||
| ## Consequences | ||
|
|
||
| - Operators that need subscriptions must supply WSS URLs; HTTP-only chains downgrade to request-only mode at the host call boundary. | ||
| - Connection failures at boot are fatal (the engine refuses to start with a broken chain). This is intentional — silent fall-back to a half-functioning state masks misconfiguration that a module then rediscovers at first event. | ||
|
brunota20 marked this conversation as resolved.
|
||
| - Adding IPC support is additive: extend the scheme match with `/` / `file://` and call `connect_ipc`. | ||
| - The `DynProvider` erasure costs a virtual dispatch per call — a measurable concern at scale, deferred to M4 if profiling shows it. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| --- | ||
| status: proposed | ||
| implemented-in: nullislabs/shepherd#8 | ||
| --- | ||
|
|
||
| # Per-module namespacing in `local-store` via 32-byte deterministic hash prefix | ||
|
|
||
| ## Context | ||
|
|
||
| `nexum:host/local-store` is a key-value store shared across all modules the engine runs. Two modules using the same key string (e.g. `"last-block"`) must see disjoint values; one module must never read or overwrite another's data. The engine knows each module's identity at instantiation time, so namespacing is a host-side concern. | ||
|
|
||
| Two properties matter for the namespace prefix: | ||
|
|
||
| 1. **Deterministic and unspoofable.** An arbitrary `module_name` string read out of `module.toml` lets a malicious or careless operator give two modules the same name and have one read the other's state. A fixed-size hash derived from the module's canonical identity is harder to collide and removes the operator-supplied-text attack surface. | ||
| 2. **Composes with ENS-based module discovery** (per `docs/03-module-discovery.md`): when a module is identified by an ENS name (e.g. `twap-monitor.shepherd.eth`), the ENS namehash is a natural prefix. ENS TXT records pinning the `.wasm` content hash provide a separate verification path against the loaded bundle. | ||
|
|
||
| ## Decision | ||
|
|
||
| Single redb database file at `EngineConfig.engine.state_dir`, single shared table `nexum:local-store`. Every key handed to redb is composed host-side as: | ||
|
|
||
| ``` | ||
| [32-byte namespace prefix][raw key bytes] | ||
| ``` | ||
|
brunota20 marked this conversation as resolved.
|
||
|
|
||
| The 32-byte prefix is computed deterministically from the module's canonical identity: | ||
|
|
||
| - **ENS-identified modules** (M3+, per `docs/03`): prefix is `ens_namehash(name)` (EIP-137), e.g. `namehash("twap-monitor.shepherd.eth")`. | ||
| - **Locally-loaded modules** (current 0.2 scope, no ENS): prefix is `keccak256(module_name)` where `module_name` comes from `module.toml`'s `[module].name` field. | ||
|
|
||
| Both produce a 32-byte digest with the same domain, so a module loaded locally during development and later published under an ENS name can keep its existing state by registering an alias (`alias = keccak256(name)`) the engine recognises during the migration window. The exact alias mechanism is out of scope for this ADR. | ||
|
|
||
| Modules see plain key strings on both the read and write paths; the prefix is invisible to the WIT-facing API. | ||
|
|
||
| ## Considered options | ||
|
|
||
| - **Separator string** (`{module}:{key}`). Rejected: any module name containing `:` collides with another module's `:`-bearing key. A fixed-size hash is unambiguous regardless of payload bytes. | ||
| - **`[len:u8][module_name][key]` length-prefixed string.** Rejected: spoofable (the name is operator-supplied text), and does not align with the ENS-based discovery path that 0.3 will introduce. The 32-byte hash is deterministic and namespace-uniform. | ||
| - **One redb database file per module.** Rejected: multiplies open file handles linearly in modules, blocks any future cross-module atomic operations (not currently planned but cheap to keep on the table), and complicates backup tooling (N files vs 1). | ||
| - **One redb *table* per module within a single file.** Rejected: redb `TableDefinition` lifetimes are `'static`, so table names must be known at compile time. Dynamic table opening per module would force string-leak workarounds and exposes the same name-collision question as separator-based keys. | ||
| - **Engine-allocated incrementing module id.** Rejected: stable across reboots only if the engine persists the allocation table, which adds a chicken-and-egg dependency on the local-store itself. Determinism from the name avoids the dependency entirely. | ||
|
|
||
| ## Consequences | ||
|
|
||
| - The prefix is fixed-size (32 bytes) and independent of module name length. Range scans over a single module's keys are O(log n + module-key-count) — fine for our workload. | ||
| - Migrations changing the prefix derivation (e.g., switching the local-mode hash function or the ENS resolver) would orphan every existing module's persisted state. The derivation must stay stable through 0.x; ENS-mode introduction in 0.3 happens additively via the alias mechanism, not by changing existing prefixes. | ||
| - A module's `list-keys` iterates over the namespace range (32-byte prefix scan); the host strips the prefix before returning to the guest. | ||
| - Module data versioning (schema migrations across module versions) is the module's responsibility. The local-store does not version values; modules MAY embed a `schema_version` byte in their stored payloads and migrate on `init` when the read value's version differs from the current code's expectation. | ||
| - ENS-based discovery (per docs/03) integrates without a prefix-format change: when a module is loaded by ENS name, the prefix is `namehash(name)`. The corresponding `.wasm` content hash is verified via ENS TXT records before loading, separately from the local-store prefix derivation. | ||
| - Spoofing protection: an operator cannot make module A read module B's state by renaming, because the prefix is the hash of the canonical name. Renaming a module to match another's name produces a name conflict the engine refuses at boot, rather than silent state takeover. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| --- | ||
| status: proposed | ||
| implemented-in: nullislabs/shepherd#10 | ||
| --- | ||
|
|
||
| # Patch `cowprotocol` crate to the head of upstream PR #5 | ||
|
|
||
| ## Context | ||
|
|
||
| `cowprotocol` v1.0.0-alpha.3 (the version on crates.io) was cut from an early snapshot of `cowdao-grants/cow-rs` PR #5 at commit `1742ffa`. That PR is still open and is the canonical upstream channel for landing additions to the Rust SDK. Its head branch is `bleu/cow-rs:main`, currently at commit `c012404`, carrying 18 follow-up commits the engine materially depends on: | ||
|
|
||
| - `composable::Proof` byte-width fix (consumed by the TWAP poll path). | ||
| - `OrderCreation` zero-`from` fast-fail (closes a MEDIUM severity finding in PR #5). | ||
| - `order_book` / `composable` submodule splits (cleaner imports on the engine side). | ||
|
|
||
| ADR-0007 commits us to landing three protocol-level primitives into PR #5 directly (`OrderPostError` rich variants + `retry_hint`, `OrderBookApi::with_base_url`, and `wasm32` feature-gating) by pushing additional commits to its head branch. Each commit advances both PR #5 and the patch rev consumed here. | ||
|
|
||
| There is no published `alpha.4` and no scheduled date for one; the engine cannot wait. | ||
|
|
||
| ## Decision | ||
|
|
||
| Add a workspace-level `[patch.crates-io]` redirecting `cowprotocol` to `https://github.com/bleu/cow-rs` at commit `c012404`. Every crate that declares `cowprotocol = "1.0.0-alpha.3"` (engine, modules, future SDK) silently picks up the patched build with no `Cargo.toml` change at the dependent site. | ||
|
|
||
| This is not a parallel fork. `bleu/cow-rs:main` IS the head branch of upstream PR #5. Pushing to it updates PR #5; the patch rev advances by bumping a single workspace line. | ||
|
|
||
| ## Considered options | ||
|
|
||
| - **Vendor the missing types locally.** Rejected: re-implementing `composable::Proof`, `OrderCreation`, etc. in the engine repo is the AI-duplication anti-pattern that the cow-rs SDK already solves. Reuse over reimplement applies. | ||
| - **Pin every dependent to `cow-rs` git directly.** Works but every new workspace member has to remember the git source. `[patch.crates-io]` centralises the override. | ||
| - **Open a separate PR per primitive against `cowdao-grants/cow-rs`.** Rejected: fragments the change across multiple PRs when one already exists at the appropriate granularity. Stacking commits on PR #5 keeps the change coherent and lets the cumulative diff be tracked in one place. | ||
| - **Wait for `alpha.4` to publish.** No ETA; the TWAP/EthFlow milestone cannot land without `composable::Proof` correct. | ||
|
|
||
| ## Consequences | ||
|
|
||
| - `cargo update` will re-resolve to the same `rev`; the lock pins it. | ||
| - Bumping the rev is a single-line workspace edit; reviewers see one diff per primitive added to PR #5. | ||
| - Drop the patch entirely once a published `cowprotocol` release contains both the alpha.3 follow-ups and the ADR-0007 protocol-primitive additions (`OrderPostError` rich variants + `retry_hint`, `OrderBookApi::with_base_url`, `wasm32` feature-gate). Until then, expect the patch rev to advance with every push to PR #5. | ||
| - Modules built against this workspace inherit the patch transitively; modules built standalone against crates.io will see `alpha.3` and may hit the very bugs the patch closes. Flag this in the SDK README when M3 lands. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| --- | ||
| status: proposed | ||
| implemented-in: nullislabs/shepherd#8 | ||
| --- | ||
|
|
||
| # `cow-api` host backend routes both `request` and `submit-order` through `cowprotocol::OrderBookApi` | ||
|
|
||
| ## Context | ||
|
|
||
| `shepherd:cow/cow-api` exposes two operations: a generic REST passthrough (`request`) and a typed order submission (`submit-order`). Either could be implemented with raw `reqwest` against `api.cow.fi/{slug}/api/v1`, but the published `cowprotocol` crate already ships an `OrderBookApi` client that knows the chain-specific base URL, the canonical paths, and the `post_order` codec. | ||
|
|
||
| ## Decision | ||
|
|
||
| At engine boot, construct one `cowprotocol::OrderBookApi` per `cowprotocol::Chain` variant (currently Mainnet, Gnosis, Sepolia, ArbitrumOne, Base) into a `BTreeMap<u64, OrderBookApi>` keyed by EVM chain id. "Cached" here means built once during boot and reused for the engine's lifetime; clients are not lazy-constructed on each call nor LRU-evicted. The pool implements `Default` so callers instantiate it as `OrderBookPool::default()`; the trait impl populates the map with one entry per `cowprotocol::Chain` variant. | ||
|
|
||
| Both `cow-api` operations consult this pool: | ||
|
|
||
| - `request` resolves the chain's `OrderBookApi`, reads `api.base_url()` for the prefix, joins the module-supplied path, and dispatches via a shared `reqwest::Client`. | ||
| - `submit-order` deserialises the JSON `OrderCreation` and calls `OrderBookApi::post_order` directly. The crate handles signing-scheme encoding, error mapping, and `OrderUid` extraction. | ||
|
|
||
| Chains not in `cowprotocol::Chain` return `HostError { kind: unsupported }` at the host call boundary. | ||
|
|
||
| ## Considered options | ||
|
|
||
| - **Raw `reqwest` for both.** Rejected: forces us to maintain the chain → base-URL table (drifts whenever cowprotocol adds a chain) and reimplement `post_order`'s body codec and error mapping, the exact duplication the cow-rs SDK already eliminates. | ||
| - **`OrderBookApi` for `submit-order`, raw `reqwest` for `request`.** Tempting (request is opaque to the crate) but means two separate chain-resolution paths, two HTTP clients, and a second place to keep the chain set in sync. | ||
| - **Build `OrderBookApi` lazily on first call per chain.** Rejected: hides config errors at runtime. Up-front boot construction surfaces unknown chains immediately and amortises away the per-call cost. | ||
|
|
||
| ## Consequences | ||
|
|
||
| - Operator-supplied custom orderbook URLs (barn, staging, forked deployments) are out of scope for the default constructor and require a follow-on `OrderBookApi::with_base_url(chain_id, base_url)` constructor in the cow-rs crate (ADR-0007 item 2, not vendored locally). | ||
| - Adding a chain means a `cowprotocol::Chain` variant lands in cow-rs first; the engine inherits it on the next patched rev bump. | ||
| - The shared `reqwest::Client` enables connection pooling across both `request` and `submit-order` paths. | ||
| - Guest-side TWAP and EthFlow modules (ADR-0006) submit orders through this `cow-api` interface; no specialised host helpers wrap it. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.