Skip to content

Fix otel context propagation in a few more places missed previsouly#4274

Open
alco wants to merge 3 commits intomainfrom
fix-otel-context-effects-move-in
Open

Fix otel context propagation in a few more places missed previsouly#4274
alco wants to merge 3 commits intomainfrom
fix-otel-context-effects-move-in

Conversation

@alco
Copy link
Copy Markdown
Member

@alco alco commented May 5, 2026

Summary

Follow-up to #4149 which fixed OTel context propagation in three spawn sites. While auditing the rest of the codebase, three additional places were found where with_child_span calls were silently dropping spans due to missing parent context.

1. Effects.query_move_in_async/4 (spawned task)

The same pattern as the sites fixed in #4149: a Task.Supervisor.start_child call wrapping SnapshotQuery.execute_for_shape, but without propagating the OTel context into the spawned task. Fixed by capturing the context before the spawn and attaching it inside the task closure.

2. ShapeCache.handle_call({:create_or_wait_shape_handle, shape, otel_ctx})

The otel_ctx was already being passed in the GenServer call message (from Electric.Shapes.get_or_create_shape_handle/3), but the handler was only forwarding it to the Snapshotter — it never set the context on the ShapeCache GenServer process itself. As a result, the with_child_span calls inside ShapeStatus.fetch_handle_by_shape_critical and ShapeStatus.add_shape were silently dropped.

3. ShapeCache.handle_call({:start_consumer_for_handle, ...})

This handler had a # TODO: otel ctx from shape log collector? comment acknowledging the gap. The caller, ConsumerRegistry.start_consumer!/2, wraps the GenServer call in OpenTelemetry.with_span(...) but the context wasn't crossing the GenServer call boundary. The public function signature was changed to take an opts keyword list (matching get_or_create_shape_handle/3), with ConsumerRegistry capturing the current context inside its span and passing it through.

Test plan

  • mix compile clean
  • mix test test/electric/telemetry/open_telemetry_test.exs — 8 passing
  • After deploy: verify name = shape_status.add_shape and name = shape_status.fetch_handle_by_shape_critical rows show up in Honeycomb under shape-creation traces (previously dropped).

alco and others added 3 commits May 5, 2026 14:59
This is a follow-up to #4149 which fixed OTel context propagation in
PartialModes and Snapshotter. The Effects.query_move_in_async function
has the same pattern: it spawns a task that calls
SnapshotQuery.execute_for_shape, but without OTel context propagation
the child spans inside would be silently dropped.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two handlers receive calls that should have OTel context but weren't
setting it before calling ShapeStatus functions that use with_child_span:

1. handle_call({:create_or_wait_shape_handle, shape, otel_ctx}) - the
   otel_ctx was passed to opts but not set in the process context before
   calling ShapeStatus.fetch_handle_by_shape_critical and add_shape.

2. handle_call({:start_consumer_for_handle, ...}) - previously had a
   TODO comment acknowledging the missing context. Now accepts an opts
   keyword list (matching get_or_create_shape_handle/3) and ConsumerRegistry
   passes the context from within its span.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

❌ 10 Tests Failed:

Tests completed Failed Passed Skipped
3538 10 3528 39
View the top 3 failed test(s) by shortest run time
Elixir.Electric.Shapes.ConsumerRegistryTest::test publish/2 starts any missing consumers
Stack Traces | 0.0019s run time
13) test publish/2 starts any missing consumers (Electric.Shapes.ConsumerRegistryTest)
     .../electric/shapes/consumer_registry_test.exs:91
     ** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:Electric.Shapes.ConsumerRegistryTest test publish/2 starts any missing consumers", {Electric.ShapeCache, nil}}}, {:start_consumer_for_handle, "handle-1", {{:span_ctx, 241503344268068633288493546702973767853, "b5afda2c5f11bcb24fc4a6d9f4f110ad", 17006836315777354670, "ec046b85eb1f4bae", 1, {:tracestate, []}, true, false, true, {:otel_span_ets, #Function<2.60261395/1 in :otel_tracer_server.on_end/1>}}, %{"stack_id" => {"Electric.Shapes.ConsumerRegistryTest test publish/2 starts any missing consumers", []}}}}, 30000)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: ConsumerRegistry.publish(%{"handle-1" => {:txn, %{lsn: 1}}}, ctx.registry_state)
     stacktrace:
       (elixir 1.19.5) lib/gen_server.ex:1135: GenServer.call/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:269: anonymous fn/2 in Electric.Shapes.ConsumerRegistry.start_consumer!/2
       (opentelemetry 1.7.0) .../opentelemetry/src/otel_tracer_default.erl:47: :otel_tracer_default.with_span/5
       (electric 1.6.2) .../electric/telemetry/open_telemetry.ex:92: anonymous fn/5 in Electric.Telemetry.OpenTelemetry.do_with_span/5
       (telemetry 1.4.1) .../telemetry/src/telemetry.erl:359: :telemetry.span/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:123: anonymous fn/4 in Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (stdlib 7.3) maps.erl:894: :maps.fold_1/4
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:122: Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:94: Electric.Shapes.ConsumerRegistry.publish/2
       .../electric/shapes/consumer_registry_test.exs:95: (test)
Elixir.Electric.Shapes.ConsumerRegistryTest::test publish/2 uses existing consumer when already active
Stack Traces | 0.0031s run time
15) test publish/2 uses existing consumer when already active (Electric.Shapes.ConsumerRegistryTest)
     .../electric/shapes/consumer_registry_test.exs:73
     ** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:Electric.Shapes.ConsumerRegistryTest test publish/2 uses existing consumer when already active", {Electric.ShapeCache, nil}}}, {:start_consumer_for_handle, "handle-1", {{:span_ctx, 77224420977043798154428680518192734781, "3a18e1ea31a3f808cf073ac414ea223d", 5487899157641916404, "4c28ef352e696ff4", 1, {:tracestate, []}, true, false, true, {:otel_span_ets, #Function<2.60261395/1 in :otel_tracer_server.on_end/1>}}, %{"stack_id" => {"Electric.Shapes.ConsumerRegistryTest test publish/2 uses existing consumer when already active", []}}}}, 30000)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: ConsumerRegistry.publish(%{"handle-1" => {:txn, %{lsn: 1}}}, ctx.registry_state)
     stacktrace:
       (elixir 1.19.5) lib/gen_server.ex:1135: GenServer.call/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:269: anonymous fn/2 in Electric.Shapes.ConsumerRegistry.start_consumer!/2
       (opentelemetry 1.7.0) .../opentelemetry/src/otel_tracer_default.erl:47: :otel_tracer_default.with_span/5
       (electric 1.6.2) .../electric/telemetry/open_telemetry.ex:92: anonymous fn/5 in Electric.Telemetry.OpenTelemetry.do_with_span/5
       (telemetry 1.4.1) .../telemetry/src/telemetry.erl:359: :telemetry.span/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:123: anonymous fn/4 in Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (stdlib 7.3) maps.erl:894: :maps.fold_1/4
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:122: Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:94: Electric.Shapes.ConsumerRegistry.publish/2
       .../electric/shapes/consumer_registry_test.exs:77: (test)
Elixir.Electric.Shapes.ConsumerRegistryTest::test publish/2 crashed consumer handling persistently suspending consumer results in shape removal after retry
Stack Traces | 0.0039s run time
11) test publish/2 crashed consumer handling persistently suspending consumer results in shape removal after retry (Electric.Shapes.ConsumerRegistryTest)
     .../electric/shapes/consumer_registry_test.exs:360
     ** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:Electric.Shapes.ConsumerRegistryTest test publish/2 crashed consumer handling persistently suspending consumer results in shape removal after retry", {Electric.ShapeCache, nil}}}, {:start_consumer_for_handle, "handle-stubborn", {{:span_ctx, 260519641413690370539354972501869030898, "c3fe4202748f847654e301a7c3bb6df2", 11902550394202676227, "a52e5a8567a2dc03", 1, {:tracestate, []}, true, false, true, {:otel_span_ets, #Function<2.60261395/1 in :otel_tracer_server.on_end/1>}}, %{"stack_id" => {"Electric.Shapes.ConsumerRegistryTest test publish/2 crashed consumer handling persistently suspending consumer results in shape removal after retry", []}}}}, 30000)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: ConsumerRegistry.publish(%{"handle-stubborn" => {:txn, %{lsn: 1}}}, ctx.registry_state)
     stacktrace:
       (elixir 1.19.5) lib/gen_server.ex:1135: GenServer.call/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:269: anonymous fn/2 in Electric.Shapes.ConsumerRegistry.start_consumer!/2
       (opentelemetry 1.7.0) .../opentelemetry/src/otel_tracer_default.erl:47: :otel_tracer_default.with_span/5
       (electric 1.6.2) .../electric/telemetry/open_telemetry.ex:92: anonymous fn/5 in Electric.Telemetry.OpenTelemetry.do_with_span/5
       (telemetry 1.4.1) .../telemetry/src/telemetry.erl:359: :telemetry.span/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:123: anonymous fn/4 in Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (stdlib 7.3) maps.erl:894: :maps.fold_1/4
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:122: Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:94: Electric.Shapes.ConsumerRegistry.publish/2
       .../electric/shapes/consumer_registry_test.exs:398: (test)
Elixir.Electric.Shapes.ConsumerRegistryTest::test publish/2 starts consumer when receiving a message
Stack Traces | 0.0061s run time
16) test publish/2 starts consumer when receiving a message (Electric.Shapes.ConsumerRegistryTest)
     .../electric/shapes/consumer_registry_test.exs:62
     ** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:Electric.Shapes.ConsumerRegistryTest test publish/2 starts consumer when receiving a message", {Electric.ShapeCache, nil}}}, {:start_consumer_for_handle, "handle-1", {{:span_ctx, 175090397273624116560877608055441268029, "83b92f149ed55d0dc48d406c8fd91d3d", 7821877404755822776, "6c8ce40ebb2008b8", 1, {:tracestate, []}, true, false, true, {:otel_span_ets, #Function<2.60261395/1 in :otel_tracer_server.on_end/1>}}, %{"stack_id" => {"Electric.Shapes.ConsumerRegistryTest test publish/2 starts consumer when receiving a message", []}}}}, 30000)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: ConsumerRegistry.publish(%{"handle-1" => {:txn, %{lsn: 1}}}, ctx.registry_state)
     stacktrace:
       (elixir 1.19.5) lib/gen_server.ex:1135: GenServer.call/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:269: anonymous fn/2 in Electric.Shapes.ConsumerRegistry.start_consumer!/2
       (opentelemetry 1.7.0) .../opentelemetry/src/otel_tracer_default.erl:47: :otel_tracer_default.with_span/5
       (electric 1.6.2) .../electric/telemetry/open_telemetry.ex:92: anonymous fn/5 in Electric.Telemetry.OpenTelemetry.do_with_span/5
       (telemetry 1.4.1) .../telemetry/src/telemetry.erl:359: :telemetry.span/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:123: anonymous fn/4 in Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (stdlib 7.3) maps.erl:894: :maps.fold_1/4
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:122: Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:94: Electric.Shapes.ConsumerRegistry.publish/2
       .../electric/shapes/consumer_registry_test.exs:66: (test)
Elixir.Electric.Shapes.ConsumerRegistryTest::test publish/2 crashed consumer handling suspended consumers are retried but crashed consumers are not
Stack Traces | 0.0065s run time
10) test publish/2 crashed consumer handling suspended consumers are retried but crashed consumers are not (Electric.Shapes.ConsumerRegistryTest)
     .../electric/shapes/consumer_registry_test.exs:293
     ** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:Electric.Shapes.ConsumerRegistryTest test publish/2 crashed consumer handling suspended consumers are retried but crashed consumers are not", {Electric.ShapeCache, nil}}}, {:start_consumer_for_handle, "handle-suspend", {{:span_ctx, 16135071485636425299362419407021042599, "0c2380719143a0e90f3574d82c9e6ba7", 15546560332228071742, "d7c07a40c1a2113e", 1, {:tracestate, []}, true, false, true, {:otel_span_ets, #Function<2.60261395/1 in :otel_tracer_server.on_end/1>}}, %{"stack_id" => {"Electric.Shapes.ConsumerRegistryTest test publish/2 crashed consumer handling suspended consumers are retried but crashed consumers are not", []}}}}, 30000)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: ConsumerRegistry.publish(
     stacktrace:
       (elixir 1.19.5) lib/gen_server.ex:1135: GenServer.call/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:269: anonymous fn/2 in Electric.Shapes.ConsumerRegistry.start_consumer!/2
       (opentelemetry 1.7.0) .../opentelemetry/src/otel_tracer_default.erl:47: :otel_tracer_default.with_span/5
       (electric 1.6.2) .../electric/telemetry/open_telemetry.ex:92: anonymous fn/5 in Electric.Telemetry.OpenTelemetry.do_with_span/5
       (telemetry 1.4.1) .../telemetry/src/telemetry.erl:359: :telemetry.span/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:123: anonymous fn/4 in Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (stdlib 7.3) maps.erl:894: :maps.fold_1/4
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:122: Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:100: Electric.Shapes.ConsumerRegistry.publish/2
       .../electric/shapes/consumer_registry_test.exs:336: (test)
Elixir.Electric.Shapes.ConsumerRegistryTest::test publish/2 retries any consumers that suspend
Stack Traces | 0.0081s run time
19) test publish/2 retries any consumers that suspend (Electric.Shapes.ConsumerRegistryTest)
     .../electric/shapes/consumer_registry_test.exs:114
     ** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:Electric.Shapes.ConsumerRegistryTest test publish/2 retries any consumers that suspend", {Electric.ShapeCache, nil}}}, {:start_consumer_for_handle, "handle-1", {{:span_ctx, 320925939418564822823459894305154892516, "f170167142144ce62f7f84ab513782e4", 476025386993413839, "069b2e913fcf6ecf", 1, {:tracestate, []}, true, false, true, {:otel_span_ets, #Function<2.60261395/1 in :otel_tracer_server.on_end/1>}}, %{"stack_id" => {"Electric.Shapes.ConsumerRegistryTest test publish/2 retries any consumers that suspend", []}}}}, 30000)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: ConsumerRegistry.publish(
     stacktrace:
       (elixir 1.19.5) lib/gen_server.ex:1135: GenServer.call/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:269: anonymous fn/2 in Electric.Shapes.ConsumerRegistry.start_consumer!/2
       (opentelemetry 1.7.0) .../opentelemetry/src/otel_tracer_default.erl:47: :otel_tracer_default.with_span/5
       (electric 1.6.2) .../electric/telemetry/open_telemetry.ex:92: anonymous fn/5 in Electric.Telemetry.OpenTelemetry.do_with_span/5
       (telemetry 1.4.1) .../telemetry/src/telemetry.erl:359: :telemetry.span/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:123: anonymous fn/4 in Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (stdlib 7.3) maps.erl:894: :maps.fold_1/4
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:122: Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:100: Electric.Shapes.ConsumerRegistry.publish/2
       .../electric/shapes/consumer_registry_test.exs:161: (test)
Elixir.Electric.Shapes.ConsumerRegistryTest::test remove_consumer/3 removes the process from the table
Stack Traces | 0.0123s run time
12) test remove_consumer/3 removes the process from the table (Electric.Shapes.ConsumerRegistryTest)
     .../electric/shapes/consumer_registry_test.exs:451
     ** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:Electric.Shapes.ConsumerRegistryTest test remove_consumer/3 removes the process from the table", {Electric.ShapeCache, nil}}}, {:start_consumer_for_handle, "handle-1", {{:span_ctx, 97080380598494097578448551215075096670, "4909004eb0a7693bf8c0cd470686f85e", 13189587632972413715, "b70ad4121d34bb13", 1, {:tracestate, []}, true, false, true, {:otel_span_ets, #Function<2.60261395/1 in :otel_tracer_server.on_end/1>}}, %{"stack_id" => {"Electric.Shapes.ConsumerRegistryTest test remove_consumer/3 removes the process from the table", []}}}}, 30000)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
     code: ConsumerRegistry.publish(%{"handle-1" => {:txn, %{lsn: 1}}}, ctx.registry_state)
     stacktrace:
       (elixir 1.19.5) lib/gen_server.ex:1135: GenServer.call/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:269: anonymous fn/2 in Electric.Shapes.ConsumerRegistry.start_consumer!/2
       (opentelemetry 1.7.0) .../opentelemetry/src/otel_tracer_default.erl:47: :otel_tracer_default.with_span/5
       (electric 1.6.2) .../electric/telemetry/open_telemetry.ex:92: anonymous fn/5 in Electric.Telemetry.OpenTelemetry.do_with_span/5
       (telemetry 1.4.1) .../telemetry/src/telemetry.erl:359: :telemetry.span/3
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:123: anonymous fn/4 in Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (stdlib 7.3) maps.erl:894: :maps.fold_1/4
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:122: Electric.Shapes.ConsumerRegistry.resolve_and_broadcast/2
       (electric 1.6.2) .../electric/shapes/consumer_registry.ex:94: Electric.Shapes.ConsumerRegistry.publish/2
       .../electric/shapes/consumer_registry_test.exs:474: (test)
Elixir.Electric.Replication.ShapeLogCollectorTest::test lazy consumer initialization consumers are started when receiving a transaction that matches their filter
Stack Traces | 0.408s run time
29) test lazy consumer initialization consumers are started when receiving a transaction that matches their filter (Electric.Replication.ShapeLogCollectorTest)
     .../electric/replication/shape_log_collector_test.exs:254
     Assertion failed, no matching message after 400ms
     The process mailbox is empty.
     code: assert_receive {:start_consumer, @shape_handle, id, pid}
     stacktrace:
       .../electric/replication/shape_log_collector_test.exs:269: (test)
Elixir.Electric.Replication.ShapeLogCollectorTest::test lazy consumer initialization consumer exits remove the filter mapping
Stack Traces | 0.41s run time
2) test lazy consumer initialization consumer exits remove the filter mapping (Electric.Replication.ShapeLogCollectorTest)
     .../electric/replication/shape_log_collector_test.exs:274
     Assertion failed, no matching message after 400ms
     The process mailbox is empty.
     code: assert_receive {:start_consumer, @shape_handle, id, consumer_pid}
     stacktrace:
       .../electric/replication/shape_log_collector_test.exs:291: (test)
Elixir.Electric.ShapeCacheTest::test await_snapshot_start/4 should wait for consumer to come up
Stack Traces | 5.67s run time
18) test await_snapshot_start/4 should wait for consumer to come up (Electric.ShapeCacheTest)
     test/electric/shape_cache_test.exs:1080
     ** (exit) exited in: Task.await(%Task{mfa: {:erlang, :apply, 2}, owner: #PID<0.6900.0>, pid: #PID<0.7022.0>, ref: #Reference<0.0.883203.3149086060.1494286340.12937>}, 5300)
         ** (EXIT) time out
     code: assert :started = Task.await(wait_task, start_consumer_delay + 5_000)
     stacktrace:
       (elixir 1.19.5) lib/task.ex:892: Task.await_receive/3
       test/electric/shape_cache_test.exs:1118: (test)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@robacourt robacourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants