Skip to content

Serialize axon-subtensor bounds reads under axon_lock#437

Merged
anderdc merged 1 commit into
testfrom
fix/axon-substrate-lock-race
May 31, 2026
Merged

Serialize axon-subtensor bounds reads under axon_lock#437
anderdc merged 1 commit into
testfrom
fix/axon-substrate-lock-race

Conversation

@LandynDev
Copy link
Copy Markdown
Collaborator

The forward loop reads bounds_cache (live-crown snapshot, scoring) on the same axon_subtensor websocket the axon handler threads use, but the two paths took different locks — axon_contract_client._substrate_lock vs axon_lock. Overlapping access lands two threads in recv, raising cannot call recv while another thread is already running recv (logged every step from scoring.py).

BoundsCache also calls get_block() (a websocket RPC) on every access, outside any lock.

Fix: one reentrant lock governs all axon_subtensor access.

  • axon_lockRLock (handlers hold it, then nest a bounds read).
  • BoundsCache holds the lock across get_block + the contract read.
  • axon_contract_client shares axon_lock as its substrate lock.

Fail-open behavior unchanged; no extra RPCs (get_block was already per-call).

@LandynDev LandynDev force-pushed the fix/axon-substrate-lock-race branch from 81cfacf to 2c2fc92 Compare May 31, 2026 16:40
@anderdc anderdc merged commit 3c8998b into test May 31, 2026
3 checks passed
@anderdc anderdc deleted the fix/axon-substrate-lock-race branch May 31, 2026 16:50
anderdc pushed a commit that referenced this pull request May 31, 2026
#437 moved bounds_cache creation after the bootstrap call, so
bootstrap_miner_rates read a not-yet-set attribute and logged
'no attribute bounds_cache', falling back to unbounded (min/max=0)
commitment reads on cold start. Move the axon/bounds_cache block
above the bootstrap call so the bounds are available.
LandynDev pushed a commit that referenced this pull request May 31, 2026
* ci(docker): tag pushed image with git sha as well as latest

Mirrors gittensor's docker-publish so every main build is pullable by
its exact commit sha (entrius/allways:<sha>), not just :latest. Makes
pinning/rolling back to a known-good build a direct image pull instead
of a source rebuild.

* fix(scoring): unpack collateral + apply squat gate in live crown snapshot

snapshot_current_crown_holders unpacked reconstruct_window_start_state
into 4 names, but #423 made it return 5 (added collaterals). On a fresh
process the forced first scoring pass hit this and threw 'too many values
to unpack (expected 4)' every forward step, blocking weight-setting.

Unpack collaterals and feed the same can_fund boundary-squat gate the
ledger path uses, so the live crown table no longer credits a holder
whose collateral can't fund their own smallest legal leg. Adds the
first tests for this function (crash regression + squat exclusion).

* fix(validator): construct bounds_cache before bootstrap_miner_rates

#437 moved bounds_cache creation after the bootstrap call, so
bootstrap_miner_rates read a not-yet-set attribute and logged
'no attribute bounds_cache', falling back to unbounded (min/max=0)
commitment reads on cold start. Move the axon/bounds_cache block
above the bootstrap call so the bounds are available.

* style: auto-fix pre-commit hooks

* chore: bump version 1.0.7 -> 1.0.8

* style: auto-fix pre-commit hooks

---------

Co-authored-by: anderdc <me@alexanderdc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants