Port :pg to Elixir, add cluster-replicated metadata, expose Registry-shaped API#8
Merged
Port :pg to Elixir, add cluster-replicated metadata, expose Registry-shaped API#8
Conversation
Reproduces OTP 27 pg.erl as a self-contained Elixir GenServer so we can extend it with metadata support without forking OTP. Wire format is byte-compatible with :pg, so a node running PgRegistry.Pg can mesh with a node running :pg under the same scope name. Tests cover both single-node behavior and a multi-node setup using :peer + :erpc, with a small helper module under test/support so its beam is loadable on the peer node.
Extends the :pg port with cluster-replicated metadata. Each entry in
the ETS table is now a {pid, meta} pair instead of a bare pid; meta
defaults to nil so the existing API is unchanged. New surface:
Pg.join(scope, group, pid_or_pids, meta)
Pg.lookup(scope, group) -> [{pid, meta}]
Pg.lookup_local(scope, group) -> [{pid, meta}]
Pg.update_meta(scope, group, pid, new_meta)
Subscription messages also carry entries now, plus a new verb:
{ref, :join, group, [{pid, meta}]}
{ref, :leave, group, [{pid, meta}]}
{ref, :update, group, [{pid, old_meta, new_meta}]}
The wire protocol gossips entries between scope GenServers and a new
{:update_meta, ...} message lets remote nodes apply metadata changes
without leave+rejoin. Sync-driven updates after a netsplit do not yet
emit :update notifications (the diff is ambiguous for multi-join);
state still converges correctly, only notifications during recovery
are imperfect.
Adds credo and styler as dev tooling. Both pass clean.
PgRegistry.Pg now stores one ETS row per entry instead of aggregating
entries into a list per group. Each row is {key, pid, meta, tag} where
tag is an opaque per-node monotonic integer that gives every entry its
own identity. Tags are internal — callers never see them — but they let
ref-counted multi-join semantics survive a flat-row layout, and they
make cross-node leaves precise (each leave names a specific entry).
Wire protocol now carries tags in join/sync messages and uses
{group, pid, tag} triples in leave messages. The 71 existing tests
all pass against the new layout, including the 6 cluster tests.
PgRegistry now uses PgRegistry.Pg as its backend and exposes the full
Registry-shaped API: register/3, unregister/2, values/3, match/3,
match/4, count_match/3, count_match/4, select/2, count_select/2,
unregister_match/3, unregister_match/4, dispatch/4, meta/2, put_meta/3,
delete_meta/2. The match-spec functions translate user-supplied
{key, pid, value} match-specs to the 4-tuple storage shape by
appending :_ for the tag, and run natively against ETS.
Scope-level metadata (meta/2 etc) is local-only and lives in a sibling
ETS table named :"#{scope}_meta", created and torn down by Pg's init
and terminate. Reads bypass the GenServer; writes go through it.
86 tests passing, credo clean. lock/3 is intentionally not implemented.
Listeners are configured at scope start-up via `listeners: [Atom]` and
receive raw Registry-shaped messages on register/unregister:
{:register, scope, key, pid, value}
{:unregister, scope, key, pid}
They fire for both locally-originated and gossiped events, and on
local DOWN, remote leave, and remote scope crash. Updates do not
fire listener messages (matching Registry).
Three new sugar surfaces for users porting from Registry:
PgRegistry.update_value(scope, key, value) # self() implicit
PgRegistry.start_link(name: :reg, listeners: [L]) # keyword shape
{PgRegistry, name: :reg, listeners: [L]} # child_spec keyword
The keyword form validates :keys and :partitions instead of silently
accepting anything: :duplicate / 1 are no-ops, :unique and partitions > 1
raise ArgumentError with multi-line messages explaining why and
pointing at :global / scope-splitting respectively. The intent is
that users porting from Registry hit a load-bearing error rather
than discovering the semantic gap in production.
100 tests passing, credo clean.
The test process and the local scope GenServer were both monitoring the remote scope pid independently. After killing the remote scope, the test waited only for its own DOWN, then called :sys.get_state on the local scope. But the local scope's DOWN may still be in flight on the dist channel — :sys.get_state only drains messages already in the mailbox, not in-flight ones, so the assertion that members are gone could fire before the local DOWN handler had run. Fix: subscribe to monitor_scope before killing, then assert_receive on the leave notification. The notification is fired from inside the DOWN handler, so receiving it proves the ETS row has been removed.
Elixir 1.17's formatter rewrites the inline `receive do: (pattern -> body)` shorthand to a broken `receive do: ->(pattern, body)` shape that fails to parse. The multi-line form formats consistently across versions.
Covers all the new surface: register/3, lookup, match/select, listeners,
scope metadata, the {key, pid, value, tag} storage layout, and the
deliberate gaps (no :unique, no partitions, no sync-driven :update
notifications) with the reasoning behind each.
PgRegistry.Pg now accepts keys: :unique. Each node enforces local
uniqueness — a second join under the same key returns
{:error, {:already_registered, pid}} — but the cluster as a whole
can still have one holder per node, with no consensus or locking.
Pg.join's return shape grows: :ok | {:error, {:already_registered, pid}}.
Multi-pid joins raise ArgumentError in unique mode (the API check
runs in the public Pg.join wrapper so callers see a proper raise
instead of a gen_server exit).
PgRegistry.register/3 propagates the error tuple. PgRegistry.register_name/2
returns :no on collision so the :via tuple integrates correctly with
GenServer.start_link, surfacing {:error, {:already_started, pid}}
naturally. parse_keyword_opts now accepts keys: :unique and threads
it through.
README documents the per-node-only semantics with explicit pointers
to :global for the cluster-wide-unique case and to leader-election
libraries for the singleton case.
13 new tests covering the Pg layer, the PgRegistry layer, the via
tuple integration, and the cluster-level "two nodes, two holders"
property. 112 tests passing, credo clean.
1. unregister_match used to delete the LIFO head of self()'s entries under the key, not the entry that matched. Concrete: join :a then :b, unregister_match :a → :b was deleted, :a remained. Fix: add Pg.unregister_match/5 as a gen-server-side primitive that builds an ETS match-spec returning the full row + tag for each match, then deletes by exact tag and updates state.local entry-by- entry. PgRegistry.unregister_match becomes a thin wrapper. 2. count_select stripped the user's body and substituted [true], silently ignoring any filter the user expressed in the body (e.g. body returning false → count was wrong). Fix: pass through the rewritten spec as-is so :ets.select_count respects the user's body. 3. count/1 walked which_groups + lookup per key, O(N×M) over the whole table. Replaced with :ets.info(scope, :size) — O(1). 4. notify_listeners called send/2 with a registered name as destination. If the name wasn't currently bound — listener crashed, not yet started, mid-restart — send/2 raised ArgumentError inside the gen_server, killing the entire scope. Fix: deliver_listener/2 uses Process.whereis first and silently drops missing listeners, matching what Elixir's Registry does internally. 5. The wire format had no version negotiation. Rolling deploys between this branch and any older version (or any future incompatible version) would corrupt state silently. Added @protocol_version 1, threaded through every discover message, reject 2-arity (legacy) discovers and version-mismatched discovers with a Logger.warning. Bumping the constant on any future wire change is now the migration story. Eight new tests cover all five fixes including the LIFO ordering bug, the count_select [false] case, the missing-listener crash path, the crashed-listener path, and both protocol-version reject paths. 120 tests passing, credo clean.
A. down_local_pid no longer fabricates a {pid, nil} leave notification
when the ETS row is missing during cleanup. The row being gone
means some other path already removed it (and notified), so we
skip the entry instead of synthesizing a fake meta.
B. keys: :unique check now ignores dead local holders. Closes the
race where a :DOWN for the previous holder is queued in the
GenServer mailbox but hasn't been processed yet — without this,
a subsequent register would spuriously fail with
{:already_registered, dead_pid}.
C. do_unregister no longer uses && — both branches of Pg.leave's
return are truthy, so the && was decorative and confusing.
D. Pg.leave with an empty pid list now returns :ok instead of
:not_joined. Added a head clause for the empty case.
L. Option validation now runs in the 2-arity start_link/2 form too,
not just the keyword form. parse_keyword_opts and the 2-arity
path both call a shared validate_pg_opts! that rejects unknown
options, validates :keys, :partitions, and :listeners with the
same diagnostics. Refactored into per-concern helpers
(validate_known_opts!, validate_keys!, validate_partitions!,
validate_listeners!) to keep cyclomatic complexity reasonable.
H. PgRegistry now exposes monitor_scope/1, monitor/2, and
demonitor/2 as defdelegates. Users no longer have to drop
into PgRegistry.Pg for runtime subscriptions.
Bumped to 0.3.0 with a CHANGELOG entry covering the metadata
extension, the Registry-shaped API, listeners, per-node :unique
mode, the wire-protocol break, and the migration story.
Eight new tests for the fixes (including a deterministic test for
the dead-holder race that uses :sys.replace_state to inject the
stale row, and a test that proves down_local_pid no longer fires
spurious notifications). 129 tests passing, credo clean.
unmangle pipes, doc clarifications, cluster test for unique-mode
remote DOWN
Functional changes:
- pop_first_group/2 is now tail-recursive with an accumulator and
Enum.reverse(acc, tail). For pids joined to many groups, popping
one entry no longer rebuilds the entire prefix body-recursively.
- dispatch/4 raises ArgumentError on any non-empty option list rather
than silently ignoring them. parallel: true silently going serial
was a footgun; if anyone passes an option, they'll find out
immediately. Updated docstring to point at "spawn tasks from the
callback yourself" as the parallelism story.
- which_groups/1 and all_local_entries/1 no longer have the
styler-mangled "lambda |> :ets.foldl(acc, table)" shape. Both use
named module-level helpers (which_groups_collect/2,
collect_local_entry/2) and call :ets.foldl directly with positional
args. Cleaner to read and stable across re-formats.
Doc clarifications (no behavior change):
- Pg.start_link/2: explicitly state listeners must be locally-
registered names ({name, node()} is not supported), and that
missing listener atoms are silently dropped.
- sync_one_group/4: long source comment explaining why no :update
notifications fire during sync, with a pointer to the README's
net-split section. Future contributors won't accidentally "fix"
the deliberate gap.
- Pg.join/4: explicit "Failure modes" section listing both the
{:error, {:already_registered, pid}} return for unique-mode
collisions and the ArgumentError raise for multi-pid joins.
- PgRegistry.update_value/4: warning callout that all of pid's
matching entries are collapsed to new_value, with a note that
this is different from Registry.update_value/3 (which only exists
in :unique mode).
- PgRegistry.register_name/2: explain that :no doesn't carry the
holder pid, and tell direct callers to use whereis_name/1 to
recover it.
- PgRegistry.whereis_name/1: explicit note about the asymmetry
with the 3-tuple via name (value field is consumed only at
registration, ignored on lookup).
- PgRegistry.match/3: note about the single-clause limitation,
pointing at select/2 for multi-clause cases.
- README: comparison table is more honest about :global's
net-split behavior (collisions resolved by user-supplied
resolver, may kill processes) and PgRegistry's per-register cost
(gossip per peer).
New cluster test: in unique mode, killing the holder on a peer
node propagates a leave to other nodes via DOWN, and the slot
becomes available locally. Subscribed via monitor_scope so the
assertion is deterministic.
133 tests passing, credo clean.
The snapshot fold used to run inside handle_call(:monitor, ...), which blocked all other operations on the scope for the duration — multi-millisecond pauses on scopes with thousands of entries, with joins/leaves queueing behind it. Split into a cheap subscribe call (Process.monitor + Map.put, runs in the GenServer) and a snapshot read (the foldl, runs in the caller's own process via direct ETS access). The GenServer is now free for concurrent operations during the snapshot. The trade-off is a race window: between the subscription being established and the snapshot read finishing, concurrent activity may result in entries appearing both in the snapshot and as delivered events. Documented in the docstring; consumers should absorb events idempotently (Map/MapSet keyed by (group, pid)) rather than relying on exactly-once delivery. State always converges to correct. Same fix applied to monitor/2 for the single-key case — its snapshot is just an :ets.lookup so it's even cheaper, but the structure now matches. Four new tests covering: pre-subscribe entries appear in snapshot, post-subscribe events fire, snapshot doesn't block concurrent operations on a 2000-entry scope, snapshot is correct across many keys. 137 tests passing, credo clean.
- README linked PgRegistry.Pg.sync_one_group/4 which is a private function, so mix docs warned on every build. Rewrote the prose to point at the source file directly and summarise the rationale (multi-join ambiguity) inline. - mix.exs description still said "backed by Erlang's :pg module". That was true at the start of this branch but we've since moved to PgRegistry.Pg. Updated the blurb to match what actually ships in 0.3.0 — distributed, metadata-aware, Registry-shaped API, listeners, ETS match-spec queries. Version still 0.3.0. 137 tests, 0 failures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch turns
PgRegistryinto a self-contained, distributed, metadata-aware process registry. Key changes:PgRegistry.Pg— an Elixir port of OTP-27pg.erlas a single GenServer per scope, owning an ETS:duplicate_bagof{key, pid, meta, tag}rows. Thetagis an internal per-entry identity that preserves:pg's ref-counted multi-join semantics in the flat-row layout and makes cross-node leaves precise.Per-member metadata — entries carry values that are gossiped between nodes.
Pg.join/4,Pg.update_meta/4, a new:updatesubscription verb, and a{:update_meta, ...}wire message.Full Registry-shaped API on
PgRegistry:register/3,unregister/2,register_name/2(incl. 3-tuple via name),unregister_name/1lookup/2,lookup_local/2(viaPg),get_members/2,get_local_members/2,keys/2,count/1,values/3,which_groups/1match/3,match/4,count_match/3,4,select/2,count_select/2,unregister_match/3,4— all run as native ETS queries with user-supplied{key, pid, value}shape transparently rewritten to operate on the 4-tuple storageupdate_value/3(self) andupdate_value/4(any pid)meta/2,put_meta/3,delete_meta/2(local-only, sibling ETS table)dispatch/3,dispatch/4monitor_scope/1,monitor/2,demonitor/2— runtime, ref-based, non-blockingListeners — Registry-shaped
{:register, scope, key, pid, value}/{:unregister, scope, key, pid}messages to statically-configured registered names. Addressed by atom, silently drops missing listeners (never crashes the scope).Per-node
:uniquemode — each node enforces local uniqueness; the cluster as a whole can still have one holder per node. Useful for "one singleton per node" patterns. Deliberately NOT cluster-wide unique (use:globalfor that). UsesProcess.alive?/1to ignore dead holders and close the DOWN race.Registry-shaped keyword
start_link/1—PgRegistry.start_link(name: ..., listeners: ..., keys: :unique). Validates:keys,:partitions,:listeners, and unknown options in both the keyword form and the 2-aritystart_link/2with the same diagnostics.Wire-protocol version handshake —
@protocol_version 1threaded through every discover message. Peers with mismatched versions refuse to peer with aLogger.warninginstead of silently corrupting state. Bumping the constant on any future wire change is the migration story for rolling upgrades.Non-blocking
monitor_scope/1— the snapshot fold runs in the calling process via direct ETS reads, not inside the GenServer. Documented overlap window between subscription and snapshot completion.Design decisions worth calling out
:set→:duplicate_bagwith per-entry monotonic tag. One row per entry makes match-specs run natively against ETS; the tag gives every entry its own identity even when(key, pid, meta)would otherwise duplicate.metaisn't clustered either.lock/3, no cluster-wide unique keys, nopartitions > 1. Each is a deliberate "this would require consensus or complicated trade-offs that don't fit the gossip model" decision.:partitionsand:keysvalidate loudly rather than silently shipping surprising behavior.:updatenotifications are deliberately unimplemented. ETS state converges correctly across netsplits; only the notification stream during convergence is incomplete. Re-implementing would require entry-level identifiers on the wire and the multi-join diff is ambiguous. There's a long source comment onsync_one_group/4explaining the reasoning.Test plan
mix test— 137 tests, 0 failuresmix credo— cleanmix format --check-formatted— passes (styler plugin enabled)mix docs— builds without warningsPg(join/leave/update_meta/monitor/listeners/unique mode/down races/etc), 79 forPgRegistry(Registry-shaped API surface):peer+:erpccovering join/leave visibility, remote DOWN, remote process exit, monitor_scope across nodes, cross-nodeupdate_meta, remote join with metadata, per-node unique mode under remote DOWNstart_link/1form and the 2-aritystart_link/2formWhat's NOT in this PR
lock/3— explicitly skippedpartitions > 1raises:updatenotifications — see design decisions aboveflush/1BEAM receive optimization — open investigation item; no action without benchmarksMigration note
PgRegistryno longer uses Erlang:pgas its backend. Anyone reaching past the public API into:pg.get_local_members(scope, key)directly must switch toPgRegistry.Pg.get_local_members/2. The two storage worlds are now disjoint — a scope started byPgRegistry.start_link/1lives inPgRegistry.Pg's ETS table, not in:pg's.Version is bumped to
0.3.0. SeeCHANGELOG.mdfor the full list of new features, behaviour changes, and bug fixes.