Skip to content

Split :via callbacks into write side and read side#9

Merged
twinn merged 7 commits intomainfrom
via-callbacks-split-read-write
Apr 9, 2026
Merged

Split :via callbacks into write side and read side#9
twinn merged 7 commits intomainfrom
via-callbacks-split-read-write

Conversation

@twinn
Copy link
Copy Markdown
Owner

@twinn twinn commented Apr 8, 2026

Summary

Post-review cleanup for the metadata/Registry-shaped work, plus ergonomics and type-safety work. Five commits:

  1. Split :via callbacks into write side and read side. register_name/unregister_name work in both :duplicate and :unique modes. whereis_name/send (and transitively GenServer.call/cast, Process.whereis, Kernel.send on a via tuple) require keys: :unique and raise ArgumentError otherwise. In :unique mode, whereis_name is strictly local-only — no remote fallback, because per-node uniqueness means each node owns its local holder. This matches the split Elixir's Registry uses — verified against lib/elixir/lib/registry.ex directly this time instead of guessing.

  2. Drop PgRegistry.get_members/2, add PgRegistry.lookup_local/2. get_members/2 was vestigial once lookup/2 existed (the bare-pid view is a one-liner: for {pid, _} <- lookup(scope, key), do: pid). lookup_local/2 is a first-class distributed-registry concern with no direct Registry analog — useful for draining a node, per-node metrics, or preferring a local instance. Layering: Pg owns pg vocabulary (get_members, get_local_members, lookup, lookup_local), PgRegistry owns Registry vocabulary (lookup, lookup_local, no get_members).

  3. README positioning table: add Horde.Registry. Four-column feature comparison (Registry / :global / Horde.Registry / PgRegistry) plus a ""Which should I use?"" decision table below. Makes the split explicit: Horde owns cluster-wide unique keys, PgRegistry owns cluster-wide duplicate keys. They're complementary, not competitors. Horde's match-spec support was verified against hexdocs before claiming ""yes"" in the table.

  4. Add doctests to PgRegistry public API. Spec audit first (both modules are fully spec'd on their public surface — the awk-based scan turned up only second clauses and GenServer callbacks, all covered). Then 13 new doctests across the common API: lookup/2, count/1, update_value/3, keys/2, register/3 (both modes), unregister/2, values/3, match/3, meta/2, register_name/2 (unique), whereis_name/1 (unique). Each uses a unique :doc_* scope atom to avoid cross-test collisions. Enabled via doctest PgRegistry in the test file.

  5. Add dialyzer to the project and CI. dialyxir 1.4 as a dev/test dep, project-local PLT under priv/plts/ (gitignored), flags [:error_handling, :unknown, :unmatched_returns] — the strictest commonly-used set. First run caught exactly two unmatched_return warnings in sync_groups/3 and sync_one_group/4 (side-effect helpers returning lists from a for comprehension that nobody cared about). Both fixed by making them return :ok explicitly and pattern-matching at the call sites. CI workflow gains a cached PLT step and mix dialyzer --format github that runs before mix test.

Design decisions worth calling out

  • :via split matches Registry. Our behaviour is within a hair of Elixir's: Registry restricts register_name to pid == self(); we allow an arbitrary local pid (inherited from Pg.join/3). Registry's send/2 pattern-matches [{pid, _}] and raises CaseClauseError on multiple entries; our send/2 goes through whereis_name and raises ArgumentError explicitly in :duplicate mode. Both are intentional small improvements, not divergences.

  • lookup_local/2 is the one Registry-absent API we're keeping. Registry is single-node, so ""local"" is trivially ""all."" We're distributed, and the local subset is genuinely useful for draining/metrics/preference-for-local patterns. The _local suffix is standard OTP convention (Node.list/0 vs Node.list(:connected), :pg.get_members/2 vs :pg.get_local_members/2, etc.).

  • which_groups/1 stays as-is. It's the one remaining pg-flavored name in PgRegistry. Registry has no equivalent (the idiomatic replacement is Registry.select/2 with a match-spec returning keys) and the operation is genuinely useful as a convenience. Leaving it is honest about the fact that distributed registries sometimes want things single-node registries don't.

  • Dialyzer flags are deliberately strict. :unmatched_returns is the one people usually disable because of false-positive volume — we enabled it and got exactly two warnings, both legitimate. Future contributors writing side-effect-only helpers will get flagged and have to think about whether the helper should return :ok or whether the return value actually matters.

Test plan

  • mix test13 doctests + 145 tests, 0 failures
  • mix credo — clean
  • mix format --check-formatted — clean (styler plugin enabled)
  • mix docs — builds without warnings
  • mix dialyzer — 0 errors with [:error_handling, :unknown, :unmatched_returns] flags
  • CI workflow updated to run mix dialyzer --format github before mix test (format checks + dialyzer fail fast before the slowest step)
  • Register/unregister_name work in both modes (new tests for the duplicate-mode path)
  • Whereis_name/send raise `ArgumentError` in duplicate mode
  • GenServer via integration on :unique scope includes {:already_started, pid} collision
  • Horde.Registry positioning verified against hexdocs before claiming ""yes"" for match-spec support

Release note

This PR was split off main after v0.3.0 was tagged and a GitHub release created (though nothing has been published to Hex). The changes here materially affect the semantics that v0.3.0 currently points at:

  • whereis_name no longer has silent local-preferred-then-remote fallback
  • whereis_name/send now require :unique scopes
  • get_members/2 is gone from PgRegistry
  • lookup_local/2 is a new PgRegistry function

Recommended release workflow after merging: delete the existing v0.3.0 tag and GitHub release, re-tag against the new tip of main, then publish to Hex for the first time. Nothing depends on the old tag because nothing is on Hex. If you'd rather respect tag immutability, ship this as v0.3.1 — but that publishes a v0.3.0 on Hex that we already know is worse, which seems silly for an unshipped version.

twinn added 5 commits April 8, 2026 00:41
Before: all four :via callbacks (register_name, unregister_name,
whereis_name, send) worked in :duplicate mode, and whereis_name had
an undocumented "prefer local, fall back to remote" resolution rule
when a key had multiple registered pids.

The rule was a footgun: callers of `GenServer.call({:via, PgRegistry,
{scope, :key}}, msg)` on different nodes could legitimately reach
different processes, depending on which node had a local entry, and
nothing about the API made that obvious.

After: align with Registry's semantics, which split the :via
protocol in two:

  * Write side — register_name/2 and unregister_name/1. Unambiguous
    in any mode ("put this pid under this key" / "take it out").
    Works in both :duplicate and :unique scopes.

  * Read side — whereis_name/1 and send/2 (and by extension
    GenServer.call, GenServer.cast, Process.whereis, and Kernel.send
    on a via tuple). These need a single pid, which a :duplicate
    scope can't honestly provide. Raise ArgumentError in :duplicate
    mode with a message pointing at lookup/2, get_members/2, and
    dispatch/3 as the duplicate-mode alternatives.

In :unique mode, whereis_name/1 is now strictly local-only (no
remote fallback). Each node's per-node uniqueness means each node
has at most one holder per key, and the via tuple resolves to that
local holder or :undefined. This is the intended shape of the
"one singleton per node" pattern. Callers that want a specific
peer's holder can use Pg.get_members/2 and filter by node/1.

Implementation: PgRegistry.Pg.init/1 now writes the scope's :keys
mode into :persistent_term under {PgRegistry.Pg, scope, :keys}, and
PgRegistry.Pg.keys_mode/1 reads it back without a GenServer hop.
terminate/2 erases. Writes to :persistent_term are costly but happen
at most once per scope start, which is fine for this use case.

Verified against Registry's actual source in
/lib/elixir/lib/registry.ex — this matches the split Registry uses:
register_name and unregister_name work in both modes, whereis_name
raises ":via is not supported for duplicate registries" in duplicate
mode, and Registry.send/2 pattern-matches a single-entry lookup
(which effectively means it works in duplicate mode only when there
happens to be exactly one entry, raising CaseClauseError otherwise).
Our send/2 is slightly cleaner — it goes through whereis_name and
raises ArgumentError explicitly in duplicate mode rather than the
CaseClauseError Registry would produce for a multi-entry match.

Tests updated: register_name/unregister_name tests now cover both
modes (they work in both). whereis_name/send tests cover unique
mode and assert the new ArgumentError in duplicate mode. The
GenServer :via integration tests now use a :unique scope, and the
"remote fallback" test was deleted since there is no longer a
remote fallback. All 143 tests passing, credo clean.
get_members/2 is vestigial — it was the primary read before lookup/2
existed. Now that lookup/2 returns [{pid, value}] Registry-style,
the pid-only view is a one-liner:

  for {pid, _} <- PgRegistry.lookup(scope, key), do: pid

That's how Registry users would write it, and it mirrors the shape
Registry.lookup/2 itself returns. The `get_members` name is also
pg-vocabulary ("members of a group") leaking into a surface we
want to be Registry-shaped.

In its place, promote lookup_local/2 from PgRegistry.Pg to a
first-class PgRegistry function. This is a distributed-registry
concern with no direct Registry analog — Registry is single-node,
so "local" is trivially "all". Useful for draining a node before
shutdown, per-node metrics, or preferring a local instance over a
remote one. The `_local` suffix is the established OTP convention
for node-local variants (Node.list vs Node.list(:connected),
:pg.get_members vs :pg.get_local_members, etc.).

Layering stays clean:
  * Pg module owns pg vocabulary: get_members, get_local_members,
    lookup, lookup_local.
  * PgRegistry module owns Registry vocabulary: lookup, lookup_local,
    values, keys, count, which_groups — no get_members.

which_groups/1 stays as-is for now. It's the one pg-flavored name
left in PgRegistry, and the operation has no Registry analog other
than "use select/2 with a match-spec that returns keys." Leaving it
is honest: "this is a distributed-registry concept we kept because
it's useful."

Tests updated: get_members tests became lookup tests, one new
describe block for lookup_local/2 added. README examples updated.
145 tests passing, credo clean, mix docs builds without warnings.

Also fixed a README code-fence bug I introduced earlier: the ":via
and :duplicate mode" blockquote was accidentally inserted INSIDE
the "Registering processes" elixir code block, which left the
closing ``` orphaned and broke ex_doc's rendering. Moved the
blockquote below the closed code block.
Four-column comparison now: Registry / :global / Horde.Registry /
PgRegistry. Key rows call out the split cleanly:

- Duplicate keys: Registry yes, :global no, Horde no, PgRegistry
  yes (default)
- Unique keys: Registry per-node, :global cluster-wide, Horde
  cluster-wide, PgRegistry per-node only

Also added rows for underlying model (ETS / lock / CRDT / gossip)
and Horde's net-split behaviour (CRDT merge picks a winner, losing
registrations silently dropped). Horde's match-spec support was
verified against hexdocs (match/4, select/2, count_match/4 are all
present).

New "Which should I use?" decision table below the feature
comparison frames the choice by use case — a group of processes
under one key → PgRegistry, a cluster-wide singleton → Horde.
Punchline made explicit: Horde owns the unique-key problem space,
PgRegistry owns the duplicate-key problem space. They're
complementary, not competitors, and a real distributed app often
wants both.

Kept the scope narrowly on registries — no mentions of
Horde.DynamicSupervisor or process migration, since this README
is comparing registries and mixing in supervision would muddy the
positioning.
Verified specs first: both PgRegistry and PgRegistry.Pg are fully
spec'd on their public surface. Nothing to add there.

Added 13 doctests across the common PgRegistry functions:

  - lookup/2 (registered + empty)
  - count/1
  - update_value/3 (happy path + :not_joined)
  - keys/2
  - register/3 (duplicate happy path + unique collision)
  - unregister/2 (happy path + idempotent)
  - values/3
  - match/3 (value-pattern filter)
  - meta/2 (round-trip with put_meta/delete_meta)
  - register_name/2 (unique mode, :yes then :no)
  - whereis_name/1 (unique mode, resolves local)

Each doctest uses a unique scope atom (:doc_*) to avoid cross-test
collisions; the test process owns the scopes and they die with it.
The examples exercise real code paths against a running scope, so
they double as small integration tests and as hex-doc examples.

Enabled via `doctest PgRegistry` in test/pg_registry_test.exs.
13 doctests + 145 tests = 158 passing.
Wires up dialyxir 1.4 as a dev/test dep with a project-local PLT
under priv/plts/ (gitignored). Flags enabled:

  - :error_handling
  - :unknown
  - :unmatched_returns

The :unmatched_returns flag is the strictest commonly-used dialyzer
check — it flags side-effect-only code that throws away a non-trivial
return value. First run caught exactly two hits:

  - sync_groups/3 (two clauses, both returning the list from a `for`)
  - sync_one_group/4 (returning the list from a `for`)

Both are side-effect helpers that ran for their effect on ETS, with
their return values silently dropped by the caller. Made both
return :ok explicitly, and updated handle_sync/4 and
sync_groups/3's recursive call to pattern-match on the :ok.

With those two fixes, `mix dialyzer` passes cleanly:

    Total errors: 0, Skipped: 0, Unnecessary Skips: 0
    done (passed successfully)

CI integration:

  - .github/workflows/ci.yml gains a `mix dialyzer --format github`
    step after the existing compile/test/format steps
  - priv/plts/ is cached per (runner os, Elixir, OTP, mix.lock) so
    the ~3-minute initial PLT build only happens when deps change

The --format github flag lets dialyzer failures show up inline in
PR reviews as code annotations.

Local perf: first PLT build was ~3 minutes (building core + Elixir
stdlib + dep PLTs). Subsequent runs are ~2 seconds. CI cache should
keep it at ~2s on most runs.
@twinn twinn force-pushed the via-callbacks-split-read-write branch from f1d3a0a to 3b5d972 Compare April 8, 2026 15:48
twinn added 2 commits April 8, 2026 12:06
Elixir's Registry.meta/2, put_meta/3, and delete_meta/2 all guard
on `is_atom(key) or is_tuple(key)`. Ours didn't — we accepted any
term as a meta key, which is more permissive but also surprising
for anyone porting from Registry: typos in atom keys silently become
different keys, there's no shape gut-check, and "put_meta takes any
term" is a lurking footgun.

Tighten to match. Guards added to:

  - Pg.get_scope_meta/2
  - Pg.put_scope_meta/3
  - Pg.delete_scope_meta/2

...all of the form `when is_atom(key) or is_tuple(key)`. Guards run
in the public API layer so callers see a clean FunctionClauseError
rather than a gen_server exit.

Also:

  - New `Pg.meta_key :: atom() | tuple()` typedoc (mirrors Registry's
    `meta_key` type).
  - `PgRegistry.meta/2`, `put_meta/3`, `delete_meta/2` @specs
    reference `Pg.meta_key()` instead of `term()`.
  - `PgRegistry.meta/2` docstring calls out the constraint and
    references Registry's contract.
  - The existing doctest gains a tuple-key round-trip case so both
    allowed shapes are exercised.
  - New test `non-atom, non-tuple keys are rejected` runs a loop
    over strings, integers, lists, and maps for each of the three
    functions.

Why this isn't a breaking change: nothing is on Hex yet, and the
previous permissive behavior was undocumented. The tighter contract
is the one anyone reading the README ("Mirrors Registry.meta/2")
would expect.

13 doctests + 147 tests passing. Credo clean. Dialyzer clean.
Three tweaks to the "Why PgRegistry" section:

1. Trimmed the editorializing that followed the two tables. The
   tables already convey where PgRegistry fits; the "Horde owns
   unique, PgRegistry owns duplicate" framing and the "deliberately
   does not try to do" paragraph were just me restating what the
   rows show. Opening paragraph is now one sentence with no
   comparative prose below the tables.

2. Added :pg as a fifth column in the feature comparison and a
   corresponding row in the decision table. :pg is the OTP
   primitive PgRegistry is built on, and it's a legitimate choice
   when you don't need metadata, listeners, or the Registry-shaped
   API. Showing it in the table makes the progression visible.

3. Added a "Part of OTP" row and an "Interop with other Erlang/OTP
   apps" row. The interop row is the honest differentiator:
   PgRegistry deliberately broke wire compatibility with :pg when
   we added metadata and the protocol-version handshake, so our
   scope can't share groups with another OTP app using :pg. That's
   a real cost and the table should show it. :global has the same
   property at the name level and also gets a yes. Horde has its
   own wire protocol and doesn't interop either.

The decision-table entry for :pg now explicitly mentions interop
as a reason you might reach for it over PgRegistry.
@twinn twinn merged commit 38edc4f into main Apr 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant