Skip to content

[PW_SID:1079574] [v2] Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion#59

Open
BluezTestBot wants to merge 1 commit intoworkflowfrom
1079574
Open

[PW_SID:1079574] [v2] Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion#59
BluezTestBot wants to merge 1 commit intoworkflowfrom
1079574

Conversation

@BluezTestBot
Copy link
Copy Markdown

When a BLE peripheral sends an L2CAP Connection Parameter Update Request
the processing path is:

process_pending_rx() [takes conn->lock]
l2cap_le_sig_channel()
l2cap_conn_param_update_req()
hci_le_conn_update() [takes hdev->lock]

Meanwhile other code paths take the locks in the opposite order:

l2cap_chan_connect() [takes hdev->lock]
...
mutex_lock(&conn->lock)

l2cap_conn_ready() [hdev->lock via hci_cb_list_lock]
...
mutex_lock(&conn->lock)

This is a classic AB/BA deadlock which lockdep reports as a circular
locking dependency when connecting a BLE MIDI keyboard (Carry-On FC-49).

Fix this by introducing l2cap_conn_param_update_sync() which is scheduled
via hci_cmd_sync_queue() to run on the hci_cmd_sync workqueue, outside
any of the involved locks. The necessary connection parameters are
captured into a heap-allocated struct conn_param_update_data and the
sync callback performs the hci_conn_params update, sends the HCI LE
Connection Update command, and notifies mgmt of the new parameters.

To guard against a theoretical handle-reuse race (the connection could
drop and its handle be reassigned between queuing and execution), the
sync callback verifies the connection still exists and its destination
address matches the original request before proceeding.

The allocation is performed before sending the L2CAP_CONN_PARAM_ACCEPTED
response so that on OOM the peer receives a REJECTED response instead of
an inconsistent state where the peer applies new parameters but the local
controller does not.

Fixes: f044eb0 ("Bluetooth: Store latency and supervision timeout in connection params")
Signed-off-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com

Changes in v2 (Paul Menzel, Sashiko/Gemini AI review):

  • Allocate before sending ACCEPTED response to avoid state mismatch on OOM
  • Verify connection handle and address in sync callback against reuse race
  • Expand commit message with implementation details

net/bluetooth/l2cap_core.c | 93 ++++++++++++++++++++++++++++++++++----
1 file changed, 85 insertions(+), 8 deletions(-)

…ock inversion

When a BLE peripheral sends an L2CAP Connection Parameter Update Request
the processing path is:

  process_pending_rx()          [takes conn->lock]
    l2cap_le_sig_channel()
      l2cap_conn_param_update_req()
        hci_le_conn_update()    [takes hdev->lock]

Meanwhile other code paths take the locks in the opposite order:

  l2cap_chan_connect()          [takes hdev->lock]
    ...
      mutex_lock(&conn->lock)

  l2cap_conn_ready()            [hdev->lock via hci_cb_list_lock]
    ...
      mutex_lock(&conn->lock)

This is a classic AB/BA deadlock which lockdep reports as a circular
locking dependency when connecting a BLE MIDI keyboard (Carry-On FC-49).

Fix this by introducing l2cap_conn_param_update_sync() which is scheduled
via hci_cmd_sync_queue() to run on the hci_cmd_sync workqueue, outside
any of the involved locks.  The necessary connection parameters are
captured into a heap-allocated struct conn_param_update_data and the
sync callback performs the hci_conn_params update, sends the HCI LE
Connection Update command, and notifies mgmt of the new parameters.

To guard against a theoretical handle-reuse race (the connection could
drop and its handle be reassigned between queuing and execution), the
sync callback verifies the connection still exists and its destination
address matches the original request before proceeding.

The allocation is performed before sending the L2CAP_CONN_PARAM_ACCEPTED
response so that on OOM the peer receives a REJECTED response instead of
an inconsistent state where the peer applies new parameters but the local
controller does not.

Fixes: f044eb0 ("Bluetooth: Store latency and supervision timeout in connection params")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
@github-actions
Copy link
Copy Markdown

CheckPatch
Desc: Run checkpatch.pl script
Duration: 0.49 seconds
Result: PENDING

@github-actions
Copy link
Copy Markdown

GitLint
Desc: Run gitlint
Duration: 0.25 seconds
Result: PENDING

@github-actions
Copy link
Copy Markdown

SubjectPrefix
Desc: Check subject contains "Bluetooth" prefix
Duration: 0.10 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

BuildKernel
Desc: Build Kernel for Bluetooth
Duration: 25.97 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

CheckAllWarning
Desc: Run linux kernel with all warning enabled
Duration: 28.59 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

CheckSparse
Desc: Run sparse tool with linux kernel
Duration: 27.21 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

BuildKernel32
Desc: Build 32bit Kernel for Bluetooth
Duration: 25.14 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunnerSetup
Desc: Setup kernel and bluez for test-runner
Duration: 565.16 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunner_l2cap-tester
Desc: Run l2cap-tester with test-runner
Duration: 27.91 seconds
Result: FAIL
Output:

Total: 96, Passed: 95 (99.0%), Failed: 1, Not Run: 0

Failed Test Cases
L2CAP BR/EDR Server - Set PHY 3M                     Failed       0.115 seconds

@github-actions
Copy link
Copy Markdown

TestRunner_iso-tester
Desc: Run iso-tester with test-runner
Duration: 37.58 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunner_bnep-tester
Desc: Run bnep-tester with test-runner
Duration: 6.27 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunner_mgmt-tester
Desc: Run mgmt-tester with test-runner
Duration: 113.08 seconds
Result: FAIL
Output:

Total: 494, Passed: 489 (99.0%), Failed: 1, Not Run: 4

Failed Test Cases
Read Exp Feature - Success                           Failed       0.105 seconds

@github-actions
Copy link
Copy Markdown

TestRunner_rfcomm-tester
Desc: Run rfcomm-tester with test-runner
Duration: 9.35 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunner_sco-tester
Desc: Run sco-tester with test-runner
Duration: 14.87 seconds
Result: FAIL
Output:

WARNING: possible circular locking dependency detected
7.0.0-rc2-gc006ce61ce3a #1 Not tainted
------------------------------------------------------
kworker/u5:2/117 is trying to acquire lock:
ffff888001946240 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_connect_cfm+0x358/0x8d0

but task is already holding lock:
ffff8880025fa220 (&conn->lock){+.+.}-{3:3}, at: sco_connect_cfm+0x22d/0x8d0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&conn->lock){+.+.}-{3:3}:
       lock_acquire+0xf7/0x2c0
       _raw_spin_lock+0x2a/0x40
       sco_sock_connect+0x4d7/0x1280
       __sys_connect+0x1a3/0x260
       __x64_sys_connect+0x6e/0xb0
       do_syscall_64+0xa0/0x570
       entry_SYSCALL_64_after_hwframe+0x74/0x7c

-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       check_prev_add+0xe9/0xc70
       __lock_acquire+0x1457/0x1df0
       lock_acquire+0xf7/0x2c0
       lock_sock_nested+0x36/0xd0
       sco_connect_cfm+0x358/0x8d0
       hci_sync_conn_complete_evt+0x3d3/0x8e0
       hci_event_packet+0x74f/0xb10
       hci_rx_work+0x398/0xd00
       process_scheduled_works+0xb16/0x1ac0
       worker_thread+0x4ff/0xba0
       kthread+0x368/0x490
       ret_from_fork+0x498/0x7e0
       ret_from_fork_asm+0x19/0x30

other info that might help us debug this:

...
BUG: sleeping function called from invalid context at net/core/sock.c:3782
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 117, name: kworker/u5:2
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
CPU: 0 UID: 0 PID: 117 Comm: kworker/u5:2 Not tainted 7.0.0-rc2-gc006ce61ce3a #1 PREEMPT(lazy) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Workqueue: hci0 hci_rx_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x49/0x60
 __might_resched+0x2ea/0x500
 lock_sock_nested+0x47/0xd0
 ? sco_connect_cfm+0x358/0x8d0
 sco_connect_cfm+0x358/0x8d0
 ? hci_debugfs_create_conn+0x190/0x210
 ? __pfx_sco_connect_cfm+0x10/0x10
 hci_sync_conn_complete_evt+0x3d3/0x8e0
 hci_event_packet+0x74f/0xb10
 ? __pfx_hci_sync_conn_complete_evt+0x10/0x10
 ? __pfx_hci_event_packet+0x10/0x10
 ? mark_held_locks+0x49/0x80
 ? lockdep_hardirqs_on_prepare+0xd4/0x180
 ? _raw_spin_unlock_irqrestore+0x2c/0x50
 hci_rx_work+0x398/0xd00
 process_scheduled_works+0xb16/0x1ac0
 ? __pfx_process_scheduled_works+0x10/0x10
 ? lock_acquire+0xf7/0x2c0
 ? lock_is_held_type+0x9b/0x110
 ? __pfx_hci_rx_work+0x10/0x10
 worker_thread+0x4ff/0xba0
 ? _raw_spin_unlock_irqrestore+0x2c/0x50
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x368/0x490
 ? _raw_spin_unlock_irq+0x23/0x40
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x498/0x7e0
 ? __pfx_ret_from_fork+0x10/0x10
 ? __switch_to+0x9e4/0xe50
 ? __switch_to_asm+0x32/0x60
...
Total: 30, Passed: 30 (100.0%), Failed: 0, Not Run: 0

@github-actions
Copy link
Copy Markdown

TestRunner_ioctl-tester
Desc: Run ioctl-tester with test-runner
Duration: 9.99 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunner_mesh-tester
Desc: Run mesh-tester with test-runner
Duration: 11.55 seconds
Result: FAIL
Output:

Total: 10, Passed: 8 (80.0%), Failed: 2, Not Run: 0

Failed Test Cases
Mesh - Send cancel - 1                               Timed out    1.909 seconds
Mesh - Send cancel - 2                               Timed out    1.996 seconds

@github-actions
Copy link
Copy Markdown

TestRunner_smp-tester
Desc: Run smp-tester with test-runner
Duration: 8.56 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunner_userchan-tester
Desc: Run userchan-tester with test-runner
Duration: 6.62 seconds
Result: PASS

@github-actions
Copy link
Copy Markdown

TestRunner_6lowpan-tester
Desc: Run 6lowpan-tester with test-runner
Duration: 8.81 seconds
Result: FAIL
Output:

WARNING: possible circular locking dependency detected
7.0.0-rc2-gc006ce61ce3a #1 Not tainted
------------------------------------------------------
kworker/0:0/9 is trying to acquire lock:
ffff888002777140 ((wq_completion)hci0#2){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x75/0x180

but task is already holding lock:
ffffffff9804d720 (rtnl_mutex){+.+.}-{4:4}, at: lowpan_unregister_netdev+0xd/0x30

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (rtnl_mutex){+.+.}-{4:4}:
       lock_acquire+0xf7/0x2c0
       __mutex_lock+0x16b/0x1fc0
       lowpan_register_netdev+0x11/0x30
       chan_ready_cb+0x836/0xd00
       l2cap_recv_frame+0x6ca1/0x8b20
       l2cap_recv_acldata+0x790/0xdf0
       hci_rx_work+0x500/0xd00
       process_scheduled_works+0xb16/0x1ac0
       worker_thread+0x4ff/0xba0
       kthread+0x368/0x490
       ret_from_fork+0x498/0x7e0
       ret_from_fork_asm+0x19/0x30

-> #3 (&chan->lock#3/1){+.+.}-{4:4}:
       lock_acquire+0xf7/0x2c0
       __mutex_lock+0x16b/0x1fc0
       l2cap_chan_connect+0x74e/0x1980
       lowpan_control_write+0x523/0x660
       full_proxy_write+0x10b/0x190
       vfs_write+0x1c0/0xf60
       ksys_write+0xf1/0x1d0
       do_syscall_64+0xa0/0x570
       entry_SYSCALL_64_after_hwframe+0x74/0x7c

-> #2 (&conn->lock){+.+.}-{4:4}:
...
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

@github-actions
Copy link
Copy Markdown

IncrementalBuild
Desc: Incremental build with the patches in the series
Duration: 0.27 seconds
Result: PENDING

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants