fix(mongodb): add pymongo error handling to classify user vs platform errors#677
fix(mongodb): add pymongo error handling to classify user vs platform errors#677aballman wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| except AutoReconnect as e: | ||
| raise DestinationConnectionError(f"MongoDB connection lost: {e}") from e | ||
| except Exception as e: | ||
| raise DestinationConnectionError(f"MongoDB error: {e}") from e |
There was a problem hiding this comment.
Catch-all swallows correctly classified errors from delete_by_record_id
High Severity
The except Exception catch-all in run_data re-wraps already-classified exceptions from delete_by_record_id as DestinationConnectionError. When delete_by_record_id raises QuotaError or TimeoutError, those propagate into run_data's try block, skip all the pymongo-specific except handlers, and get caught by the generic except Exception — converting them back to DestinationConnectionError. This directly undermines the PR's goal of correct error classification. The catch-all needs to re-raise UnstructuredIngestError subclasses before falling through to the generic handler.
Additional Locations (1)
brndnblck
left a comment
There was a problem hiding this comment.
Bugbot flagged a real issue that needs addressing:
except Exception swallows correctly classified errors
When delete_by_record_id() raises QuotaError or TimeoutError, those exceptions propagate into run_data()'s try block, skip the pymongo-specific handlers (since they're no longer pymongo exceptions), and get caught by the generic except Exception - converting them back to DestinationConnectionError. This directly undermines the PR's goal.
Fix by re-raising already-classified errors before the catch-all:
except (QuotaError, TimeoutError, WriteError, DestinationConnectionError):
raise # Re-raise already-classified errors
except Exception as e:
raise DestinationConnectionError(f"MongoDB error: {e}") from eOr if UnstructuredIngestError is the base class:
except UnstructuredIngestError:
raise
except Exception as e:
raise DestinationConnectionError(f"MongoDB error: {e}") from eAlso worth adding a test that exercises the delete -> error -> propagation path to catch this regression.
… errors Pymongo exceptions in run_data() and delete_by_record_id() were falling through to the generic Exception catch in api_generator.py, defaulting to HTTP 500 and being classified as "platform" errors. This was the largest source of SLI misattribution (933 errors in 7 days). Add explicit try/except for pymongo-specific exceptions: - OperationFailure with "quota" → QuotaError (401) - OperationFailure (other) → DestinationConnectionError (400) - ServerSelectionTimeoutError → TimeoutError (408) - AutoReconnect → DestinationConnectionError (400) - BulkWriteError → WriteError (400)
Fixups after rebasing onto current main to unblock CI:
- test/unit/connectors/test_mongodb_errors.py: wrap pymongo import
with pytest.importorskip("pymongo") so the unit suite skips
gracefully in environments that don't have the mongodb extra
installed (CI was failing with ModuleNotFoundError).
- unstructured_ingest/__version__.py: bump 1.6.2 -> 1.6.3 to satisfy
check-version (previous value of 1.4.15 conflicted with a released
tag from when this branch was originally cut).
- CHANGELOG.md: add a 1.6.3 entry describing the mongodb error
classification change.
No change to the load-bearing mongodb.py error mapping (BulkWriteError
-> WriteError, OperationFailure -> QuotaError,
ServerSelectionTimeoutError -> TimeoutError, AutoReconnect ->
DestinationConnectionError). Exception ordering preserved so subclass
catches fire before parent.
75eb740 to
8543c52
Compare
Three P2 findings from codex on the prior commit: - run_data catch-all was relabeling QuotaError/TimeoutError/WriteError raised by delete_by_record_id as DestinationConnectionError. Add an early `except (QuotaError, TimeoutError, WriteError, DestinationConnectionError): raise` ahead of the broad except blocks so internal types pass through unchanged. - NetworkTimeout is a subclass of AutoReconnect. Without a dedicated catch, in-flight socket timeouts were being labeled as DestinationConnectionError instead of TimeoutError. Add an explicit `except NetworkTimeout` clause before AutoReconnect on both run_data and delete_by_record_id. - Quota detection via `"quota" in str(e).lower()` could false-positive on namespace names (e.g. an auth failure on database 'quota_db'). Replace with an _is_quota_error() helper that inspects the server's structured `errmsg` field via `e.details` before falling back to the bare `_message`; this avoids matching against the full str(e) representation which often includes 'full error: <details>' with the namespace embedded. Test coverage added for all three paths: - NetworkTimeout on run_data and delete_by_record_id -> TimeoutError - OperationFailure with 'quota_db' namespace but unrelated errmsg -> DestinationConnectionError (regression test for the false positive) - OperationFailure with structured details errmsg='user is over quota' -> QuotaError (structured-detection happy path) - delete-time QuotaError surfaces unchanged from run_data (no relabel by the catch-all)
|
Claude finished @aballman's task —— View job Code Review: MongoDB Error HandlingTodo List:
Critical Issue: Exception Re-wrapping Bug🚨 HIGH SEVERITY: Bugbot identified a real issue that undermines the PR's goal. The current except (QuotaError, TimeoutError, WriteError, DestinationConnectionError):
# Internal types raised by delete_by_record_id (or future helpers).
# Must come before the broad except blocks so they aren't relabeled.
raise
# ... other pymongo exception handlers ...
except Exception as e:
raise DestinationConnectionError(f"MongoDB error: {e}") from eThe Problem: When Your test at line 148 actually validates this exact scenario but doesn't reveal the bug because the test structure masks the issue. Review of Your Focus Areas1. Quota Detection Logic ✅ GOODYour
No action needed - the structured approach is more reliable than Atlas error codes, which can vary by operation type. 2. Exception Tuple in run_data ❌ BROKENYour tuple approach is correct in theory but implemented incorrectly: Current (Broken): except (QuotaError, TimeoutError, WriteError, DestinationConnectionError):
raise
# ... pymongo handlers ...
except Exception as e:
raise DestinationConnectionError(f"MongoDB error: {e}") from eShould be: # ... pymongo handlers first ...
except UnstructuredIngestError:
raise # Re-raise any already-classified internal errors
except Exception as e:
raise DestinationConnectionError(f"MongoDB error: {e}") from e3. NetworkTimeout Ordering ✅ CORRECTLines 429-432 properly handle 4. Credential Leak Risk
|
There was a problem hiding this comment.
1 issue found across 4 files
Shadow auto-approve: would not auto-approve because issues were found.
Fix all with cubic | Re-trigger cubic
…r re-raise Address Claude review feedback. Replace the explicit tuple `except (QuotaError, TimeoutError, WriteError, DestinationConnectionError): raise` with `except UnstructuredIngestError: raise`. Behavior is identical for the current set of internal types and future-proofs against new internal types raised from delete_by_record_id or other helpers (SourceConnectionError, ConnectionError, ValueError, etc.). No test changes - all 14 existing tests still validate the same flows.
|
Thanks for the review. I want to push back on the main finding before applying the rest. The "critical bug" claim that Standalone repro to confirm the semantics: class A(Exception): pass
class B(Exception): pass
def inner():
raise A("mapped")
def outer():
try:
inner()
except (A, B):
raise
except Exception as e:
raise RuntimeError(f"wrapped: {e}") from e
try:
outer()
except A as e:
print(f"GOT A (correct): {e}")
except Exception as e:
print(f"GOT WRONG: {type(e).__name__}: {e}")
# Prints: GOT A (correct): mappedThe end-to-end test So the ordering is correct as written. That said, your suggestion to use the base class is a strict improvement, so I applied that in 38c525e: replaced the explicit tuple with On your other points:
Ready for human review when one of the connector folks has a moment. |
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Shadow auto-approve: would not auto-approve. Auto-approval blocked by 1 unresolved issue from previous reviews.
Re-trigger cubic
Address cubic review P2: the broad `except Exception` was relabeling non-pymongo failures (KeyError/TypeError/ValueError from config or input-shape errors) as DestinationConnectionError, masking real programming/config bugs as connection problems. Replace with `except PyMongoError`. Behavior: - Known pymongo exceptions: still handled by the specific clauses above (BulkWriteError/OperationFailure/ServerSelectionTimeoutError/ NetworkTimeout/AutoReconnect). - Unknown pymongo exceptions: still wrapped as DestinationConnectionError (sensible default for the pymongo family). - Non-pymongo exceptions: propagate unchanged so upstream classification isn't poisoned by a misleading destination-connection label. Adds a regression test that a KeyError raised inside the upload path propagates unchanged rather than being relabeled.
There was a problem hiding this comment.
0 issues found across 2 files (changes from recent commits).
Shadow auto-approve: would require human review. This PR changes core error classification logic in the MongoDB uploader, which alters the error types and status codes returned for production failures, directly impacting downstream error handling, retries, and SLO monitoring – a change with non-trivial blast radius that should be reviewed by a te?
Re-trigger cubic


Summary
run_data()anddelete_by_record_id()in the MongoDB uploaderExceptioncatch inapi_generator.py, defaulting to HTTP 500 and being classified as "platform" errors — this was the largest source of SLI misattribution for the DAG Job Completions SLO (933 errors in 7 days)OperationFailurewith quota →QuotaError(401),ServerSelectionTimeoutError→TimeoutError(408),BulkWriteError→WriteError(400),AutoReconnect→DestinationConnectionError(400)Test plan
run_data()anddelete_by_record_id()error_type: "platform"toerror_type: "user"Note
Medium Risk
Changes how MongoDB write/delete failures are surfaced (error type/status code), which may affect downstream error handling and retries. Logic is straightforward and covered by new unit tests, but impacts runtime classification of production failures.
Overview
MongoDB uploader errors are now explicitly classified.
MongoDBUploader.run_data()anddelete_by_record_id()catch keypymongoexceptions and re-raise them as typed ingest errors (QuotaError,TimeoutError,WriteError,DestinationConnectionError) instead of falling through as generic failures.Adds a focused unit test suite (
test_mongodb_errors.py) to assert the exception-to-error mapping for both insert and delete paths.Written by Cursor Bugbot for commit 75eb740. This will update automatically on new commits. Configure here.