fix: incorrect incremental absorption in AVX512VL impl of SHA3-512#2442
Conversation
The existing test absorbed the 200-byte message in a single call, which exercises the multi-permutation path but always aligns segment boundaries with the Keccak rate (72 bytes for SHA3-512). It therefore misses the case where a segment straddles a rate boundary mid-absorption. Add a loop that absorbs in chunks of (RATE - 1) bytes, forcing every permutation trigger to occur in the middle of a segment. Absorbing at exactly RATE does not expose the bug; the error only surfaces when a segment boundary falls inside a rate block. Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca>
The partial-fill-and-permute path in the sha3_absorb macro consumed
\`capacity\` bytes via keccak_1600_partial_add (which advances arg2 and
clobbers %r12) but did not decrement the remaining message-length
tracker. The post-permute path at label 3 then treated the original
mlen as still pending, causing the bytes already consumed in the
partial fill to be 're-absorbed' (reading past the end of the input
buffer) into the post-permute state and leaving s[25] off by
\`capacity\`. Symptom: SHA3-{256,384,512} digests are wrong whenever
an incremental absorb call exactly fills, or crosses, the rate
boundary - reproduced via HQC-3 / HQC-5 KAT failures on AVX512
runners (HQC-1 is unaffected because its hash_g input is 65 bytes,
which never crosses the 72-byte SHA3-512 rate).
The matching shake_absorb macro in the same file already has the
correct pattern (subq %r12, arg3 before the partial_add call); this
just mirrors it in sha3_absorb.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca>
Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca>
|
@dstebila where did liboqs acquire this AVX512VL implementation? I found this Intel white paper which seems to suggest an one-off collaboration. Is it possible to reach the original collaborators? I don't know enough x86 assembly to vet the correctness of the proposed fix from Claude. Even if it passes all the unit tests, including the new ones I added, I still feel hesitant about committing unfamiliar AVX512 assembly into the code base. To me a more conservative approach would be to turn off |
The main contributors from Intel were @mdcornu @tkanteck and @erdincoz. Maybe one of them could take a look at this? |
mdcornu
left a comment
There was a problem hiding this comment.
Nice find! The fix looks good but the new test case seems to be passing only because msg1600 repeats the same byte value for the entire message. It might be a good idea to add another vector to validate the incremental calls.
See open-quantum-safe#2442 (comment) Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca>
|
Thanks for the fix and for reviewing this area. |
Yes, I thought about a new test vector, but then apparently the repeated A3 test vector is part of some NIST test vector setup, so I didn't want to go changing it. But if you think it's worth adding another vector, that would be great. |
Sure. I can do that once this PR is merged 👍 |
|
@xuganyu96 I think this is okay to merge. |
…pen-quantum-safe#2442) * tests: SHA3-512 inc absorb at rate boundary [skip ci] The existing test absorbed the 200-byte message in a single call, which exercises the multi-permutation path but always aligns segment boundaries with the Keccak rate (72 bytes for SHA3-512). It therefore misses the case where a segment straddles a rate boundary mid-absorption. Add a loop that absorbs in chunks of (RATE - 1) bytes, forcing every permutation trigger to occur in the middle of a segment. Absorbing at exactly RATE does not expose the bug; the error only surfaces when a segment boundary falls inside a rate block. Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca> * Fix off-by-N in AVX512VL sha3_absorb at rate boundary The partial-fill-and-permute path in the sha3_absorb macro consumed \`capacity\` bytes via keccak_1600_partial_add (which advances arg2 and clobbers %r12) but did not decrement the remaining message-length tracker. The post-permute path at label 3 then treated the original mlen as still pending, causing the bytes already consumed in the partial fill to be 're-absorbed' (reading past the end of the input buffer) into the post-permute state and leaving s[25] off by \`capacity\`. Symptom: SHA3-{256,384,512} digests are wrong whenever an incremental absorb call exactly fills, or crosses, the rate boundary - reproduced via HQC-3 / HQC-5 KAT failures on AVX512 runners (HQC-1 is unaffected because its hash_g input is 65 bytes, which never crosses the 72-byte SHA3-512 rate). The matching shake_absorb macro in the same file already has the correct pattern (subq %r12, arg3 before the partial_add call); this just mirrors it in sha3_absorb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca> * Code formatting Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca> * Fix test_sha3 absorb length See open-quantum-safe#2442 (comment) Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca> --------- Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca> Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca> Co-authored-by: Douglas Stebila <dstebila@uwaterloo.ca>
Summary
This pull request fixes some incorrect logic in the AVX512VL implementation of SHA3-512 and provides a regression test that would have failed without the fix.
AI/LLM disclosure
The assembly fix was generated by Claude Opus 4.7 using Claude Code.