Problem
The cross-epoch reorder guarantee in §5.3 applies only within a single batch — decryptAndSort orders previous-epoch frames before current-epoch ones, and the cache path uses a pre-decrypted header. A previous-epoch straggler that arrives in a later poll than the one that triggered the ratchet fails tryDecryptHeader because we retain only the current and next receive header keys (§8.3.1(6)).
Realistic vectors
- WAL retry on flaky network: Alice's outbox holds a pre-encrypted epoch-N blob across her own ratchet to N+1; when connectivity returns, the WAL retransmits the old blob to its original epoch-N token.
- Server pagination on backlog drain: Bob comes back online after a long absence; page 1 trips the ratchet, page 2's prev-epoch messages arrive next poll. Real path given
MAX_MESSAGES_PER_POLL = 500.
- Brief polling-token race during epoch transition.
Impact
- Location samples: invisible — the next update supersedes.
- Sticky state transitions (
stop, stationary): a brief UI glitch (friend still shown as moving / sharing) until the sender's next message arrives. If the sender goes quiet after a stop — which is the whole point — the correction never comes.
Proposed enhancement
Retain the previous receive header key for a bounded window:
- Add
prevRecvHeaderKey: ByteArray to SessionState (shared/.../Types.kt).
- In
Session.performDhRatchet (Session.kt:355), copy the old headerKey into prevRecvHeaderKey before overwriting.
- In
E2eeProtocol.tryDecryptHeader and Session.decryptMessage's header-decrypt block, try prevRecvHeaderKey after headerKey and nextHeaderKey.
Compatibility
Purely local state — no wire change:
- Envelope, AAD, handshake formats are unchanged.
SessionState JSON deserializers use ignoreUnknownKeys = true, so old persisted blobs deserialize fine (missing field → default empty ByteArray); rolled-back code ignores the extra field.
Out of scope
- Bounded retention window: TBD, but should be short enough that an attacker who somehow learns a stale header key cannot replay arbitrarily far back. A single-ratchet window (one DH step) is likely sufficient.
- A sender-side approach (re-emit
stop/stationary periodically in keepalives) is an alternative for the UI-stickiness symptom that doesn't require this state change.
Triggering
Reproducible via PreviousEpochStragglerTest: deliver a previous-epoch seq >= 2 straggler in a separate poll after the ratchet. Currently asserts the drop; flipping the assertion would verify the fix.
Problem
The cross-epoch reorder guarantee in §5.3 applies only within a single batch —
decryptAndSortorders previous-epoch frames before current-epoch ones, and the cache path uses a pre-decrypted header. A previous-epoch straggler that arrives in a later poll than the one that triggered the ratchet failstryDecryptHeaderbecause we retain only the current and next receive header keys (§8.3.1(6)).Realistic vectors
MAX_MESSAGES_PER_POLL = 500.Impact
stop,stationary): a brief UI glitch (friend still shown as moving / sharing) until the sender's next message arrives. If the sender goes quiet after astop— which is the whole point — the correction never comes.Proposed enhancement
Retain the previous receive header key for a bounded window:
prevRecvHeaderKey: ByteArraytoSessionState(shared/.../Types.kt).Session.performDhRatchet(Session.kt:355), copy the oldheaderKeyintoprevRecvHeaderKeybefore overwriting.E2eeProtocol.tryDecryptHeaderandSession.decryptMessage's header-decrypt block, tryprevRecvHeaderKeyafterheaderKeyandnextHeaderKey.Compatibility
Purely local state — no wire change:
SessionStateJSON deserializers useignoreUnknownKeys = true, so old persisted blobs deserialize fine (missing field → default emptyByteArray); rolled-back code ignores the extra field.Out of scope
stop/stationaryperiodically in keepalives) is an alternative for the UI-stickiness symptom that doesn't require this state change.Triggering
Reproducible via
PreviousEpochStragglerTest: deliver a previous-epochseq >= 2straggler in a separate poll after the ratchet. Currently asserts the drop; flipping the assertion would verify the fix.