From fe24d80d38cdc840880203a87bb6fa848616ef17 Mon Sep 17 00:00:00 2001 From: tashen Date: Fri, 5 Jun 2026 15:11:27 +0800 Subject: [PATCH] scst_user: Fix infinite cleanup loop caused by stale SGV pool reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The device cleanup loop in dev_user_process_cleanup() spins at ~2 million iterations per second and never exits, ultimately triggering a kernel soft lockup. The previous workaround panicked the system after 10,000 iterations. Root cause (confirmed by instrumentation): A ucmd gets permanently stuck in ucmd_hash with: state = UCMD_STATE_ON_FREE_SKIPPED (7) cmd = NULL ref = 1 sent_to_user = 0 The stuck ref=1 is the reference taken by dev_user_alloc_pages() via ucmd_get() for the first scatter-gather page. It is released only by dev_user_free_sg_entries() → ucmd_put(), which fires when the SGV pool *evicts* a cached object. The sequence that prevents this eviction: 1. dev_user_unjam_dev() finds an EXECING command (sent_to_user=1, ref=2: alloc + alloc_pages), bumps ref to 3 via ucmd_get_check(), then calls dev_user_unjam_cmd(). 2. dev_user_unjam_cmd() releases cmd_list_lock and calls scst_cmd_done(SCST_CONTEXT_THREAD), which synchronously runs the full SCST completion pipeline: dev_user_on_free_cmd() ucmd->cmd = NULL ucmd->state = UCMD_STATE_ON_FREE_SKIPPED (type == IGNORE) dev_user_process_reply_on_free() dev_user_free_sgv() sgv_pool_free(ucmd->sgv) /* SGV cached on pool LRU; dev_user_free_sg_entries() * not called; alloc_pages ucmd_get() not balanced */ ucmd->sgv = NULL ucmd_put() ← ref: 3→2 3. Back in dev_user_unjam_dev(): ucmd_put() ← ref: 2→1. ref != 0, so dev_user_free_ucmd() / cmd_remove_hash() are NOT called. ucmd remains in ucmd_hash. 4. unjam_cmd also reset sent_to_user=0, so on every subsequent pass through dev_user_unjam_dev() the ucmd is counted (res++) but skipped (!sent_to_user → continue). dev_user_get_next_cmd() returns -EAGAIN (ucmd is not in ready_cmd_list). With cleanup_done=1 the while(1) loop has no exit condition. The sgv_pool_flush() calls at the TOP of dev_user_unjam_dev() run BEFORE any commands are unjammed. SGV objects cached during unjamming are therefore never flushed; dev_user_free_sg_entries() never fires. Fix: Add sgv_pool_flush() for both pools at the BOTTOM of dev_user_unjam_dev(), after the spinlock is released. This evicts all SGV objects cached during unjamming, triggering: dev_user_free_sg_entries() → ucmd_put() → dev_user_free_ucmd() → cmd_remove_hash() removing the stuck ucmd from the hash. On the next cleanup-loop iteration dev_user_unjam_dev() returns res=0 and dev_user_process_cleanup() breaks. sgv_pool_flush() is fully synchronous (calls sgv_dtor_and_free() inline); by the time it returns the callbacks have already fired and the ucmd has already been removed from the hash. No schedule() or sleep is needed. --- scst/src/dev_handlers/scst_user.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/scst/src/dev_handlers/scst_user.c b/scst/src/dev_handlers/scst_user.c index e11982097..b930921d5 100644 --- a/scst/src/dev_handlers/scst_user.c +++ b/scst/src/dev_handlers/scst_user.c @@ -2732,6 +2732,17 @@ static int dev_user_unjam_dev(struct scst_user_dev *dev) spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock); + /* + * Flush again after unjamming. Unjamming calls sgv_pool_free(), which + * caches the SGV object on the pool LRU instead of freeing it directly. + * The pre-unjam flush above misses these objects. Without this second + * flush, dev_user_free_sg_entries() never fires, the alloc_pages + * ucmd_get() ref is never balanced, and the ucmd stays in ucmd_hash + * indefinitely — causing dev_user_process_cleanup() to loop forever. + */ + sgv_pool_flush(dev->pool); + sgv_pool_flush(dev->pool_clust); + TRACE_EXIT_RES(res); return res; }