Skip to content

fix(mcp): handle SSL connection drop during pre-call session teardown#39917

Open
aminghadersohi wants to merge 3 commits intoapache:masterfrom
aminghadersohi:fix/mcp-ssl-connection-teardown
Open

fix(mcp): handle SSL connection drop during pre-call session teardown#39917
aminghadersohi wants to merge 3 commits intoapache:masterfrom
aminghadersohi:fix/mcp-ssl-connection-teardown

Conversation

@aminghadersohi
Copy link
Copy Markdown
Contributor

SUMMARY

MCP tools running in thread-pool workers (sync tools) call db.session.remove() before each invocation to clear any stale thread-local session. If the underlying DBAPI connection died between requests — e.g. RDS dropped it due to SSL idle-timeout or max-connection-age — the implicit rollback() inside session.close() raises OperationalError. This caused the tool call to fail with a stack trace, even when the operation itself had completed successfully.

Root cause: sync_wrapper in auth.py called db.session.remove() bare, with no handling for connection-level errors.

Fix: Extract _remove_session_safe() which:

  1. Catches OperationalError from session.remove()
  2. Calls session.invalidate() to mark the dead connection for pool discard (prevents it being checked out again by another thread)
  3. Retries session.remove() to deregister the scoped session

The tool call proceeds normally; a fresh connection is obtained on the next DB access.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only change

TESTING INSTRUCTIONS

  1. Unit test: pytest tests/unit_tests/mcp_service/test_auth_user_resolution.py::test_sync_wrapper_handles_ssl_error_on_pre_call_remove
  2. To reproduce manually: configure a short RDS SSL idle timeout, make an MCP tool call after an idle period — observe no error in logs.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

When RDS drops an SSL connection due to idle timeout or max-connection-age,
`db.session.remove()` in `sync_wrapper` raises `OperationalError` because
the implicit rollback inside `session.close()` fails on the dead DBAPI
connection. This caused the MCP tool call to fail even when the operation
itself completed successfully, and left a dead connection in the pool.

Introduce `_remove_session_safe()` which:
- Catches `OperationalError` from `session.remove()` (SSL/network errors)
- Calls `session.invalidate()` to mark the dead connection for pool discard
- Retries `session.remove()` so the scoped registry is clean before the tool runs

Replace the bare `db.session.remove()` in `sync_wrapper` with `_remove_session_safe()`.
Add a unit test verifying `invalidate()` is called and remove is retried on SSL error.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 7.69231% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.87%. Comparing base (5b5dd01) to head (f85d927).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
superset/mcp_service/auth.py 7.69% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39917      +/-   ##
==========================================
- Coverage   63.88%   63.87%   -0.02%     
==========================================
  Files        2583     2583              
  Lines      136604   136636      +32     
  Branches    31502    31504       +2     
==========================================
+ Hits        87276    87277       +1     
- Misses      47812    47843      +31     
  Partials     1516     1516              
Flag Coverage Δ
hive 39.36% <7.69%> (-0.02%) ⬇️
mysql 59.03% <7.69%> (-0.03%) ⬇️
postgres 59.11% <7.69%> (-0.03%) ⬇️
presto 41.06% <7.69%> (-0.02%) ⬇️
python 60.54% <7.69%> (-0.03%) ⬇️
sqlite 58.74% <7.69%> (-0.03%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aminghadersohi aminghadersohi marked this pull request as ready for review May 7, 2026 02:39
@aminghadersohi aminghadersohi requested a review from Copilot May 7, 2026 02:39
@dosubot dosubot Bot added the change:backend Requires changing the backend label May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens MCP sync tool execution against stale/closed DBAPI connections by making the pre-invocation db.session.remove() step tolerant to OperationalError (e.g., SSL idle-timeout connection drops), so successful tool calls aren’t failed by teardown cleanup.

Changes:

  • Added _remove_session_safe() to catch OperationalError from db.session.remove(), invalidate the session’s connection, and retry removal.
  • Updated sync_wrapper in mcp_auth_hook to use _remove_session_safe() instead of calling db.session.remove() directly.
  • Added a unit test covering the SSL/connection-drop failure mode and ensuring the tool call still succeeds.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
superset/mcp_service/auth.py Introduces safe scoped-session removal logic and wires it into sync MCP tool execution.
tests/unit_tests/mcp_service/test_auth_user_resolution.py Adds regression coverage to ensure sync_wrapper tolerates OperationalError during pre-call session cleanup.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 7, 2026

Code Review Agent Run #c82e1f

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 166d047..166d047
    • superset/mcp_service/auth.py
    • tests/unit_tests/mcp_service/test_auth_user_resolution.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

- Replace nonlocal counter in SSL error test with MagicMock side_effect list
- Add inline comment on retry db.session.remove() to clarify intent
@netlify
Copy link
Copy Markdown

netlify Bot commented May 7, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 9b57b45
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69fcb8b4f1efa20008b04626
😎 Deploy Preview https://deploy-preview-39917--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Comment thread superset/mcp_service/auth.py Outdated
…st OperationalError

Some database drivers (e.g. MySQL, SQLite) surface dropped connections as
InterfaceError rather than OperationalError. Both are DBAPIError subclasses.
Widen the catch in _remove_session_safe() so all DBAPI-level disconnect
errors are handled consistently regardless of driver.
@aminghadersohi
Copy link
Copy Markdown
Contributor Author

Addressed catch: widened from to so drivers that surface disconnect as are also covered. — agor claude on Amin's behalf

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 8, 2026

Code Review Agent Run #a5a6a8

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 166d047..f85d927
    • superset/mcp_service/auth.py
    • tests/unit_tests/mcp_service/test_auth_user_resolution.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants