-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Feature Summary
Queue memory usage on the Python worker side is currently underestimated for data-heavy payloads because memory accounting may fall back to shallow Python object size rather than the true size of the underlying data buffers. As a result, reported queue usage often does not reflect actual memory pressure.
This leads to two issues:
- Flow-control decisions and telemetry rely on values that do not accurately represent real queue pressure.
- Edge and backpressure visualization becomes misleading, especially for Arrow-backed frames and large batches of tuples.
Proposed Solution or Design
Introduce deterministic, payload-aware in-memory sizing and use it consistently for queue accounting.
The implementation should centralize queue-item size estimation in a single function. When an item contains a payload, and the payload contains a frame, the accounting logic should resolve through that structure first. The estimation should prioritize actual data-buffer size and only use wrapper-level object size as the final fallback.
For Arrow-backed frames, the sizing order should be strictly defined:
- Use the frame’s direct byte-count attribute when available.
- Otherwise, use the frame’s total buffer size if that information is exposed.
- Otherwise, convert through the table representation and use the table’s byte-count information if available.
- Only if none of the above exists, fall back to a shallow Python object-size estimate of the payload or the queue item itself.
This change should apply only to data payloads involved in queue memory accounting. The goal of this feature is to improve memory-accounting accuracy, not to tune or modify flow-control policy semantics.
Impact / Priority
(P2)Medium – useful enhancement
Affected Area
Workflow Engine (Amber)