Skip to content

feat: add content sanitization and interruptible shutdown#23

Merged
naji247 merged 3 commits intomainfrom
feat/content-sanitization
Feb 26, 2026
Merged

feat: add content sanitization and interruptible shutdown#23
naji247 merged 3 commits intomainfrom
feat/content-sanitization

Conversation

@naji247
Copy link
Member

@naji247 naji247 commented Feb 26, 2026

Summary

  • Add content sanitization module that strips binary/non-text content (images, audio, base64 blobs) from event payloads before sending to the MCPCat API or telemetry exporters
  • Two-layer approach: Layer 1 replaces non-text response content blocks, Layer 2 recursively detects and redacts large base64 strings (>=10KB) in parameters and structured content
  • Improve event queue shutdown: replace time.sleep with interruptible shutdown_event.wait for retry backoff, add early return on shutdown detection, and cancel pending futures during executor shutdown

Test plan

  • Run sanitization tests: pytest tests/test_sanitization.py -v (19 tests covering response content, parameter scanning, boundary conditions, and immutability)
  • Run event queue tests: pytest tests/test_event_queue.py -v (26 tests including 2 new shutdown-path tests)
  • Run full test suite: pytest tests/ -v to verify no regressions
  • Type check: mypy src/mcpcat/modules/sanitization.py

MCP servers can return non-text content (images, audio, binary resources) and
tool parameters can contain embedded base64 blobs. These bloat event payloads
and aren't useful for analytics.

This adds a two-layer sanitization step to the event processing pipeline:
- Layer 1: replaces image/audio/blob/unknown content blocks in responses
- Layer 2: recursively scans parameters and structured_content for large
  base64 strings (>=10KB) and redacts them

Sanitization runs after customer redaction and before event ID generation,
using deepcopy to preserve immutability of the original event.

Also improves event queue shutdown behavior:
- Replace time.sleep with shutdown_event.wait for interruptible retry backoff
- Early return in _send_event when shutdown is detected
- Pass cancel_futures=True to executor.shutdown()
_sanitize_content_block(block) for block in response["content"]
]

if "structured_content" in response:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess no guarantee the key won't be camelCase, I'd probably add that here just in case. I searched and we do check for result.isError in some places too, so camelCase isn't out of the question

event = copy.deepcopy(event)

if event.response is not None:
_sanitize_response(event.response)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as TypeScript here, should wrap in try/except just in case sanitization throws.

…ontent

The sanitizer previously only checked the `structured_content` key,
which would miss base64 data if serialization produced `structuredContent`
(camelCase) or any other response field. Now scans all non-content
response fields through the base64 scanner.
@naji247 naji247 merged commit 10fec12 into main Feb 26, 2026
37 checks passed
@naji247 naji247 deleted the feat/content-sanitization branch February 26, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants