Skip to content

E2EE spec review: correctness fixes and clarifications #311

Merged
danmarg merged 13 commits into
mainfrom
fix/docs
Jun 13, 2026
Merged

E2EE spec review: correctness fixes and clarifications #311
danmarg merged 13 commits into
mainfrom
fix/docs

Conversation

@danmarg

@danmarg danmarg commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Mix of doc-only fixes, one behavioural code change, and two small implementation fixes.

Doc-only corrections

  • EK_B.priv lifetime: §3.1/§4.1/§4.4 incorrectly claimed Bob's ephemeral key is deleted immediately; it survives as localDhPriv until his first DH ratchet. Added bootstrap-window security note to §5.5.
  • prev_recv_token contradiction: implementation polls exactly one token (§5.4.2 was correct); removed ghost prev_recv_token field from §8.2 SessionState and rewrote §9.2 to match.
  • KDF_CK definition: §11 table claimed a single 76-byte HKDF expand; §8.3 (HMAC for MK/CK′, separate HKDF for nonce) is correct. Fixed §11 and the stale Ratchet.kt file-header comment.
  • Discovery mailbox multiple-responder: specified process-all semantics (not first-responder-wins), aligning with FR #233. Impl upgrade tracked there.
  • Header-undecryptable frames: specified in §5.4.4 — not ACKed immediately, starvation bounded by force-ACK after MAX_SILENT_DROP_RETRIES polls; addressed post-ratchet duplicate tension.
  • §12.3 correlation claim: softened "impossible to correlate" to content-layer only; noted same client IP polling T_old then T_new is an observable metadata linkage.
  • Minor corrections: §5.3 step 2 (pn is in header not body), §8.3.1(5) (seq resets to 0, first message is seq=1), §4.2 (EK_B.pub not EK_A.pub), PROTOCOL_VERSION three-encoding explanation, formatSafetyNumber algorithm defined in §3.4, full header-key schedule specified in §4.4, §9.1.2 max plaintext + de-padding rule, §7.4.2 GCM→Poly1305, AGENTS.md AES-256-GCM→ChaCha20-Poly1305, stale cross-references (ToC, §2.3, §5.7.2, §12.5).
  • Stale-pinning: closed as WAI — withholding is indistinguishable from Alice going offline; authenticated ts ensures honest last-seen display. Added §2.3 note including stationary-flag edge case.

Implementation fixes

  • MAX_GAP removed; replaced with MAX_SKIPPED_KEYS directly at the one call site in Session.kt. The two constants were inconsistent (10,000 vs 1,000) and the cache-capacity pre-check was always stricter anyway.
  • Body-fail seq-key caching removed from Session.kt. Caching MK_n on body-AEAD failure provides no robustness (server can just drop instead) and unnecessarily extends key lifetime by up to 7 days. State advancement on body-fail is retained to prevent DH desync. ReceiveRatchetFailureTest updated to verify new behaviour.

Test plan

  • ./gradlew :shared:jvmTest passes (verified locally)
  • Review ReceiveRatchetFailureTest — now asserts recvSeq advances, key is NOT cached, clean copy is rejected as replay
  • Skim docs/e2ee-location-sync.md diff for the header-key schedule (§4.4) and formatSafetyNumber (§3.4) additions — these are the most substantive new prose

🤖 Generated with Claude Code

danmarg and others added 12 commits June 12, 2026 22:54
§3.1, §4.1, §4.4 incorrectly claimed Bob's ephemeral private key is
deleted immediately after SK derivation. In fact it survives as
localDhPriv until his first DH ratchet step, which is required to
process Alice's eager-ratchet message. §5.5 was already correct.

Also adds a security note in §5.5 on the bootstrap window: during the
interval between Bob posting KeyExchangeInit and completing his first
ratchet, recovery of device state plus the public QR payload is
sufficient to reconstruct SK.

Closes issue #2 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The implementation polls exactly one recv_token per cycle (§5.4.2 was
correct). There is no prevRecvToken in SessionState or polling logic.
§9.2 incorrectly required polling a prev_recv_token during epoch
transitions; §8.2 SessionState spec carried a non-existent field.

Fix:
- Remove prev_recv_token from §8.2 SessionState struct
- Rewrite §9.2 to match §5.4.2 and the implementation: single-token
  polling; note the metadata advantage over two-token polling

Closes issue #3 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
§8.3 was correct: KDF_CK uses HMAC-SHA-256 for both the message key
and next chain key, then a separate HKDF-SHA-256 for the nonce. The
§11 primitives table incorrectly described it as a "single 76-byte
HKDF expand". The file-header comment in Ratchet.kt had the same error.

Fix §11 table and Ratchet.kt comment to match §8.3 and the implementation.

Closes issue #4 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace first-responder-wins with process-all: Alice processes every
KeyExchangeInit received during the discovery window, establishing one
independent session per scanner. This eliminates silent hijack-via-
displacement (a rogue init can no longer crowd out a legitimate one)
and aligns the spec with the multi-scan FR (issue #233).

Key spec changes in §4.2:
- Discovery window closes on UI dismissal, not after first init
- Multiple-init UX: surface count, prompt Safety Number per session
- Security note: server ordering no longer determines who gets a session
- Honest implementation status note: current client is still first-
  responder-wins; upgrade tracked in issue #233

Closes issue #5 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MAX_GAP (10,000) was dead code: the cache-capacity pre-check in
Session.decryptMessage always fired first, bounding same-epoch key
derivation to ~MAX_SKIPPED_KEYS (1,000) steps regardless. Remove the
separate constant and use MAX_SKIPPED_KEYS directly at the one call
site in Session.kt.

Update §8.3.1 to:
- State the real gap limit (MAX_SKIPPED_KEYS = 1,000, checked before
  any derivation begins)
- Correctly bound the server-forced HMAC DoS to ~1,000 ops/frame
- Document cross-epoch pn gap-filling silent eviction behavior

Closes issue #6 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Caching MK_n on body-AEAD failure provides no robustness: a server
willing to deliver a corrupted copy can equally just drop the message.
The cache only extended key lifetime by up to 7 days with no benefit.

Remove the caching. On body-fail after a valid header, the receiver
still advances recvSeq and the chain key (required to prevent DH
desync), but does not cache the key. The failed message is lost,
equivalent to a server drop.

Update §8.3.1(4) to require this behaviour and explicitly forbid the
cache. Update ReceiveRatchetFailureTest to verify recvSeq advances,
the key is not cached, and a subsequent clean copy is rejected as a
replay.

Closes issue #7 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add to §5.4.4:
- Header-undecryptable frames are not ACKed immediately; the client
  cannot distinguish garbage from a genuine future-epoch message.
- Starvation prevention: clients MUST force-ACK an entire batch after
  MAX_SILENT_DROP_RETRIES consecutive polls with zero successes
  (~2.5 min at 30s interval with default of 5 retries).
- Clarify the §5.4.4 post-ratchet duplicate tension: a duplicate
  transition message is header-undecryptable after ratchet advance and
  will be cleared by the force-ACK path.

Closes issue #8 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Token rotation prevents content-layer correlation across epochs, but
the same client IP polling T_old then T_new in adjacent cycles is an
observable metadata linkage. Clarify that the "impossible to correlate"
claim applies to content only, consistent with §2.3 and §7.3.

Closes the required sub-item of issue #9 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Message withholding is indistinguishable from the sender going offline
or staying stationary. The authenticated ts field means the receiver
always displays an honest last-seen time. The stationary-flag edge case
("here since X" indefinitely) is also WAI -- indistinguishable from
genuine stationarity. No protocol change warranted.

Added a note to §2.3 capturing this reasoning including the stationary
edge case.

Closes issue #11 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- §5.3 step 2: prev_chain_len is in the encrypted header, not the body
- §8.3.1(5): clarify counter resets to 0 so first message is seq=1;
  state initial recv_msg_num=0
- §4.2: fix EK_A.pub → EK_B.pub in the observer/retroactive sentence
- §9: document all three PROTOCOL_VERSION encodings (JSON int, 1-byte
  header plaintext, 4-byte body AAD) with rationale for each width
- §3.4: define formatSafetyNumber algorithm (5-byte chunks, 40-bit
  big-endian, mod 100,000, zero-padded, 3×4 groups); §4.4 back-ref
- §4.4: specify full header-key schedule across bootstrap and each
  KDF_RK step, including Alice's eager-ratchet promotion
- §9.1.2: add max plaintext (511 bytes) and de-padding rule (0x80
  marker, scan backwards, reject if absent or non-zero before marker)
- §7.4.2: "GCM overhead" → "Poly1305 tag (16 bytes)"
- AGENTS.md: AES-256-GCM → ChaCha20-Poly1305
- ToC: add missing section 12 (Signal comparison), renumber to 13
- §2.3 quantum: §12 → §13
- §5.7.2: 7-day window §7.2 → §10.2
- §12.5: §13 → §13.2

Closes issue #10 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- §3.4: state explicitly that safety-number verification authenticates
  keys only (EK_A.pub, EK_B.pub), not the suggested name or any other
  invite payload field
- §4.2: remove fingerprint field from QR payload spec. QR codes have
  built-in Reed-Solomon error correction; the field is fully derivable
  from ek_pub and provides no integrity an adversary cannot recompute.
  Two-stage impl removal tracked in issues #313 (stop verifying) and
  #314 (stop sending).
- Binding suggested_name into the safety number: won't fix (name is a
  user-confirmed pre-fill with no cryptographic standing per §3.2)
- Token-follow IP linkage (#9 optional): tracked in issue #312

Closes issue #12 in TODO_SPEC_UPDATES.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@danmarg danmarg changed the title E2EE spec review: correctness fixes and clarifications (issues #2–#11) E2EE spec review: correctness fixes and clarifications Jun 13, 2026
Working document; should not be in version control.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@danmarg danmarg merged commit 90e9ae6 into main Jun 13, 2026
6 checks passed
@danmarg danmarg deleted the fix/docs branch June 13, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant