Skip to content

engine: use an unlocked version of Get() in processAddrDelete()#3837

Merged
roman-khimov merged 1 commit intomasterfrom
engine-gc-close-deadlock
Feb 19, 2026
Merged

engine: use an unlocked version of Get() in processAddrDelete()#3837
roman-khimov merged 1 commit intomasterfrom
engine-gc-close-deadlock

Conversation

@roman-khimov
Copy link
Member

processAddrDelete is supposed to be called with RLock taken on e.blockMtx and that's correct for its callers. Then it shouldn't RLock it again which is what it tries to do by calling Get(). This leads to deadlock between Close() and GC routine running in background:

goroutine 787 [sync.RWMutex.Lock, 9 minutes]:
sync.runtime_SemacquireRWMutex(0xc000304cfc?, 0x1?, 0x1002d90?)
	/opt/hostedtoolcache/go/1.24.13/x64/src/runtime/sema.go:105 +0x25
sync.(*RWMutex).Lock(0xc000304ce8)
	/opt/hostedtoolcache/go/1.24.13/x64/src/sync/rwmutex.go:155 +0x89
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).setBlockExecErr(0xc000304cb0, {0x1546240, 0x1c12290})
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/control.go:106 +0x65
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Close(0xc000304cb0)
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/control.go:76 +0xd0
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.TestChildrenExpiration.func1()
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/gc_test.go:54 +0x2f
testing.(*common).Cleanup.func1()
	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1211 +0x170
testing.(*common).runCleanup(0xc0003af6c0, 0x0)
	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1445 +0x2b4
testing.tRunner.func2()
	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1786 +0x4d
testing.tRunner(0xc0003af6c0, 0x14184f8)
	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1798 +0x25f
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1851 +0x8f3

goroutine 753 [sync.RWMutex.RLock, 9 minutes]:
sync.runtime_SemacquireRWMutexR(0xc000304cf8?, 0x1?, 0x1017465?)
	/opt/hostedtoolcache/go/1.24.13/x64/src/runtime/sema.go:100 +0x25
sync.(*RWMutex).RLock(0xc000304ce8)
	/opt/hostedtoolcache/go/1.24.13/x64/src/sync/rwmutex.go:74 +0x5b
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Get(0xc000304cb0, {{0x3c, 0xa9, 0x7, 0xc9, 0x47, 0xd8, 0x5c, 0x4e, 0xb2, ...}, ...})
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/get.go:34 +0x211
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).processAddrDelete(0xc000304cb0, {{0x3c, 0xa9, 0x7, 0xc9, 0x47, 0xd8, 0x5c, 0x4e, 0xb2, ...}, ...}, ...)
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/inhume.go:99 +0x1065
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).processExpiredObjects(0xc000304cb0, {0xc00050ee40, 0x1, 0x1336669?})
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/inhume.go:269 +0x4c6
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).collectExpiredObjects(0xc000035240)
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:238 +0xb2e
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).removeGarbage(0xc000035240)
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:149 +0x125
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).tickRemover(0xc000572960)
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:120 +0x116
created by github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).init in goroutine 787
	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:81 +0xa5

Fixes #3835.

Related to 60d0d3a.

processAddrDelete is supposed to be called with RLock taken on e.blockMtx and
that's correct for its callers. Then it shouldn't RLock it again which is what
it tries to do by calling Get(). This leads to deadlock between Close() and GC
routine running in background:

    goroutine 787 [sync.RWMutex.Lock, 9 minutes]:
    sync.runtime_SemacquireRWMutex(0xc000304cfc?, 0x1?, 0x1002d90?)
    	/opt/hostedtoolcache/go/1.24.13/x64/src/runtime/sema.go:105 +0x25
    sync.(*RWMutex).Lock(0xc000304ce8)
    	/opt/hostedtoolcache/go/1.24.13/x64/src/sync/rwmutex.go:155 +0x89
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).setBlockExecErr(0xc000304cb0, {0x1546240, 0x1c12290})
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/control.go:106 +0x65
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Close(0xc000304cb0)
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/control.go:76 +0xd0
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.TestChildrenExpiration.func1()
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/gc_test.go:54 +0x2f
    testing.(*common).Cleanup.func1()
    	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1211 +0x170
    testing.(*common).runCleanup(0xc0003af6c0, 0x0)
    	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1445 +0x2b4
    testing.tRunner.func2()
    	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1786 +0x4d
    testing.tRunner(0xc0003af6c0, 0x14184f8)
    	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1798 +0x25f
    created by testing.(*T).Run in goroutine 1
    	/opt/hostedtoolcache/go/1.24.13/x64/src/testing/testing.go:1851 +0x8f3

    goroutine 753 [sync.RWMutex.RLock, 9 minutes]:
    sync.runtime_SemacquireRWMutexR(0xc000304cf8?, 0x1?, 0x1017465?)
    	/opt/hostedtoolcache/go/1.24.13/x64/src/runtime/sema.go:100 +0x25
    sync.(*RWMutex).RLock(0xc000304ce8)
    	/opt/hostedtoolcache/go/1.24.13/x64/src/sync/rwmutex.go:74 +0x5b
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Get(0xc000304cb0, {{0x3c, 0xa9, 0x7, 0xc9, 0x47, 0xd8, 0x5c, 0x4e, 0xb2, ...}, ...})
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/get.go:34 +0x211
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).processAddrDelete(0xc000304cb0, {{0x3c, 0xa9, 0x7, 0xc9, 0x47, 0xd8, 0x5c, 0x4e, 0xb2, ...}, ...}, ...)
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/inhume.go:99 +0x1065
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).processExpiredObjects(0xc000304cb0, {0xc00050ee40, 0x1, 0x1336669?})
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/engine/inhume.go:269 +0x4c6
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).collectExpiredObjects(0xc000035240)
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:238 +0xb2e
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).removeGarbage(0xc000035240)
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:149 +0x125
    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).tickRemover(0xc000572960)
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:120 +0x116
    created by github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).init in goroutine 787
    	/home/runner/work/neofs-node/neofs-node/pkg/local_object_storage/shard/gc.go:81 +0xa5

Fixes #3835.

Related to 60d0d3a.

Signed-off-by: Roman Khimov <roman@nspcc.ru>
@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 25.54%. Comparing base (52149c0) to head (8ffbc52).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3837      +/-   ##
==========================================
- Coverage   25.55%   25.54%   -0.01%     
==========================================
  Files         669      669              
  Lines       42941    42943       +2     
==========================================
- Hits        10973    10971       -2     
- Misses      30965    30967       +2     
- Partials     1003     1005       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roman-khimov roman-khimov merged commit 71a8258 into master Feb 19, 2026
25 of 27 checks passed
@roman-khimov roman-khimov deleted the engine-gc-close-deadlock branch February 19, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestChildrenExpiration lockup

2 participants

Comments