Skip to content

fix: incorrect incremental absorption in AVX512VL impl of SHA3-512#2442

Merged
xuganyu96 merged 4 commits into
open-quantum-safe:mainfrom
xuganyu96:fix/sha3-512-inc-avx512-rate-boundary
May 27, 2026
Merged

fix: incorrect incremental absorption in AVX512VL impl of SHA3-512#2442
xuganyu96 merged 4 commits into
open-quantum-safe:mainfrom
xuganyu96:fix/sha3-512-inc-avx512-rate-boundary

Conversation

@xuganyu96
Copy link
Copy Markdown
Contributor

Summary

This pull request fixes some incorrect logic in the AVX512VL implementation of SHA3-512 and provides a regression test that would have failed without the fix.

AI/LLM disclosure

The assembly fix was generated by Claude Opus 4.7 using Claude Code.

  • [NO] Does this PR change the input/output behaviour of a cryptographic algorithm (i.e., does it change known answer test values)? (If so, a version bump will be required from x.y.z to x.(y+1).0.)
  • [ NO ] Does this PR change the list of algorithms available -- either adding, removing, or renaming? Does this PR otherwise change an API? (If so, PRs in fully supported downstream projects dependent on these, i.e., oqs-provider will also need to be ready for review and merge by the time this is merged. Also, make sure to update the list of algorithms in the continuous benchmarking files: .github/workflows/kem-bench.yml and sig-bench.yml)

xuganyu96 and others added 3 commits May 20, 2026 17:12
The existing test absorbed the 200-byte message in a single call, which
exercises the multi-permutation path but always aligns segment boundaries
with the Keccak rate (72 bytes for SHA3-512). It therefore misses the case
where a segment straddles a rate boundary mid-absorption.

Add a loop that absorbs in chunks of (RATE - 1) bytes, forcing every
permutation trigger to occur in the middle of a segment. Absorbing at
exactly RATE does not expose the bug; the error only surfaces when a
segment boundary falls inside a rate block.

Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca>
The partial-fill-and-permute path in the sha3_absorb macro consumed
\`capacity\` bytes via keccak_1600_partial_add (which advances arg2 and
clobbers %r12) but did not decrement the remaining message-length
tracker. The post-permute path at label 3 then treated the original
mlen as still pending, causing the bytes already consumed in the
partial fill to be 're-absorbed' (reading past the end of the input
buffer) into the post-permute state and leaving s[25] off by
\`capacity\`. Symptom: SHA3-{256,384,512} digests are wrong whenever
an incremental absorb call exactly fills, or crosses, the rate
boundary - reproduced via HQC-3 / HQC-5 KAT failures on AVX512
runners (HQC-1 is unaffected because its hash_g input is 65 bytes,
which never crosses the 72-byte SHA3-512 rate).

The matching shake_absorb macro in the same file already has the
correct pattern (subq %r12, arg3 before the partial_add call); this
just mirrors it in sha3_absorb.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca>
Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca>
@xuganyu96
Copy link
Copy Markdown
Contributor Author

@dstebila where did liboqs acquire this AVX512VL implementation? I found this Intel white paper which seems to suggest an one-off collaboration. Is it possible to reach the original collaborators?

I don't know enough x86 assembly to vet the correctness of the proposed fix from Claude. Even if it passes all the unit tests, including the new ones I added, I still feel hesitant about committing unfamiliar AVX512 assembly into the code base. To me a more conservative approach would be to turn off OQS_USE_SHA3_AVX512VL by default and issue a warning in the same way old implementation of HQC from PQClean was turned off.

@coveralls
Copy link
Copy Markdown

coveralls commented May 20, 2026

Coverage Status

Coverage is 82.268%xuganyu96:fix/sha3-512-inc-avx512-rate-boundary into open-quantum-safe:main. No base build found for open-quantum-safe:main.

@dstebila
Copy link
Copy Markdown
Member

@dstebila where did liboqs acquire this AVX512VL implementation? I found this Intel white paper which seems to suggest an one-off collaboration. Is it possible to reach the original collaborators?

I don't know enough x86 assembly to vet the correctness of the proposed fix from Claude. Even if it passes all the unit tests, including the new ones I added, I still feel hesitant about committing unfamiliar AVX512 assembly into the code base. To me a more conservative approach would be to turn off OQS_USE_SHA3_AVX512VL by default and issue a warning in the same way old implementation of HQC from PQClean was turned off.

The main contributors from Intel were @mdcornu @tkanteck and @erdincoz. Maybe one of them could take a look at this?

@xuganyu96 xuganyu96 marked this pull request as ready for review May 22, 2026 13:55
@xuganyu96 xuganyu96 requested review from bhess and dstebila as code owners May 22, 2026 13:55
Copy link
Copy Markdown
Contributor

@mdcornu mdcornu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! The fix looks good but the new test case seems to be passing only because msg1600 repeats the same byte value for the entire message. It might be a good idea to add another vector to validate the incremental calls.

Comment thread tests/test_sha3.c Outdated
See open-quantum-safe#2442 (comment)

Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca>
Copy link
Copy Markdown
Contributor

@mdcornu mdcornu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@tkanteck
Copy link
Copy Markdown
Contributor

Thanks for the fix and for reviewing this area.
We remain fully committed to keeping the AVX512VL implementation secure and fully functional. We’d be happy to help strengthen the existing test coverage, in particular by adding more comprehensive and targeted vectors.

@dstebila
Copy link
Copy Markdown
Member

Nice find! The fix looks good but the new test case seems to be passing only because msg1600 repeats the same byte value for the entire message. It might be a good idea to add another vector to validate the incremental calls.

Yes, I thought about a new test vector, but then apparently the repeated A3 test vector is part of some NIST test vector setup, so I didn't want to go changing it. But if you think it's worth adding another vector, that would be great.

@mdcornu
Copy link
Copy Markdown
Contributor

mdcornu commented May 27, 2026

Nice find! The fix looks good but the new test case seems to be passing only because msg1600 repeats the same byte value for the entire message. It might be a good idea to add another vector to validate the incremental calls.

Yes, I thought about a new test vector, but then apparently the repeated A3 test vector is part of some NIST test vector setup, so I didn't want to go changing it. But if you think it's worth adding another vector, that would be great.

Sure. I can do that once this PR is merged 👍

@dstebila
Copy link
Copy Markdown
Member

@xuganyu96 I think this is okay to merge.

@xuganyu96 xuganyu96 merged commit f986aea into open-quantum-safe:main May 27, 2026
93 checks passed
@xuganyu96 xuganyu96 deleted the fix/sha3-512-inc-avx512-rate-boundary branch May 27, 2026 17:24
pablo-gf pushed a commit to pablo-gf/liboqs that referenced this pull request May 29, 2026
…pen-quantum-safe#2442)

* tests: SHA3-512 inc absorb at rate boundary [skip ci]

The existing test absorbed the 200-byte message in a single call, which
exercises the multi-permutation path but always aligns segment boundaries
with the Keccak rate (72 bytes for SHA3-512). It therefore misses the case
where a segment straddles a rate boundary mid-absorption.

Add a loop that absorbs in chunks of (RATE - 1) bytes, forcing every
permutation trigger to occur in the middle of a segment. Absorbing at
exactly RATE does not expose the bug; the error only surfaces when a
segment boundary falls inside a rate block.

Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca>

* Fix off-by-N in AVX512VL sha3_absorb at rate boundary

The partial-fill-and-permute path in the sha3_absorb macro consumed
\`capacity\` bytes via keccak_1600_partial_add (which advances arg2 and
clobbers %r12) but did not decrement the remaining message-length
tracker. The post-permute path at label 3 then treated the original
mlen as still pending, causing the bytes already consumed in the
partial fill to be 're-absorbed' (reading past the end of the input
buffer) into the post-permute state and leaving s[25] off by
\`capacity\`. Symptom: SHA3-{256,384,512} digests are wrong whenever
an incremental absorb call exactly fills, or crosses, the rate
boundary - reproduced via HQC-3 / HQC-5 KAT failures on AVX512
runners (HQC-1 is unaffected because its hash_g input is 65 bytes,
which never crosses the 72-byte SHA3-512 rate).

The matching shake_absorb macro in the same file already has the
correct pattern (subq %r12, arg3 before the partial_add call); this
just mirrors it in sha3_absorb.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca>

* Code formatting

Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca>

* Fix test_sha3 absorb length

See open-quantum-safe#2442 (comment)

Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca>

---------

Signed-off-by: Ganyu (Bruce) Xu <g66xu@uwaterloo.ca>
Signed-off-by: Douglas Stebila <dstebila@uwaterloo.ca>
Co-authored-by: Douglas Stebila <dstebila@uwaterloo.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants