RFC: Add incremental encaps API to support ML-KEM Braid#1619
RFC: Add incremental encaps API to support ML-KEM Braid#1619mkannwischer wants to merge 16 commits intomainfrom
Conversation
325ab51 to
285fc8a
Compare
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12152 cycles |
11971 cycles |
1.02 |
ML-KEM-512 encaps |
13644 cycles |
13745 cycles |
0.99 |
ML-KEM-512 decaps |
17745 cycles |
17771 cycles |
1.00 |
ML-KEM-768 keypair |
21243 cycles |
21010 cycles |
1.01 |
ML-KEM-768 encaps |
22040 cycles |
22095 cycles |
1.00 |
ML-KEM-768 decaps |
28171 cycles |
28300 cycles |
1.00 |
ML-KEM-1024 keypair |
30069 cycles |
29866 cycles |
1.01 |
ML-KEM-1024 encaps |
31921 cycles |
31758 cycles |
1.01 |
ML-KEM-1024 decaps |
41455 cycles |
39591 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
ppc64le (POWER10) benchmarks
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59195 cycles |
60047 cycles |
0.99 |
ML-KEM-512 encaps |
72146 cycles |
72930 cycles |
0.99 |
ML-KEM-512 decaps |
92035 cycles |
92987 cycles |
0.99 |
ML-KEM-768 keypair |
98370 cycles |
98984 cycles |
0.99 |
ML-KEM-768 encaps |
114890 cycles |
115469 cycles |
0.99 |
ML-KEM-768 decaps |
140426 cycles |
141322 cycles |
0.99 |
ML-KEM-1024 keypair |
150732 cycles |
149075 cycles |
1.01 |
ML-KEM-1024 encaps |
169608 cycles |
167651 cycles |
1.01 |
ML-KEM-1024 decaps |
200794 cycles |
198842 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
14387 cycles |
14291 cycles |
1.01 |
ML-KEM-512 encaps |
16035 cycles |
16019 cycles |
1.00 |
ML-KEM-512 decaps |
21525 cycles |
21507 cycles |
1.00 |
ML-KEM-768 keypair |
24400 cycles |
24715 cycles |
0.99 |
ML-KEM-768 encaps |
25724 cycles |
25491 cycles |
1.01 |
ML-KEM-768 decaps |
33593 cycles |
33275 cycles |
1.01 |
ML-KEM-1024 keypair |
37480 cycles |
37264 cycles |
1.01 |
ML-KEM-1024 encaps |
36769 cycles |
36892 cycles |
1.00 |
ML-KEM-1024 decaps |
51840 cycles |
46772 cycles |
1.11 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-1024 decaps |
51840 cycles |
46772 cycles |
1.11 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
29373 cycles |
28131 cycles |
1.04 |
ML-KEM-512 encaps |
35678 cycles |
36568 cycles |
0.98 |
ML-KEM-512 decaps |
45642 cycles |
45119 cycles |
1.01 |
ML-KEM-768 keypair |
46591 cycles |
46297 cycles |
1.01 |
ML-KEM-768 encaps |
56594 cycles |
55658 cycles |
1.02 |
ML-KEM-768 decaps |
69487 cycles |
69941 cycles |
0.99 |
ML-KEM-1024 keypair |
70808 cycles |
70194 cycles |
1.01 |
ML-KEM-1024 encaps |
83450 cycles |
82475 cycles |
1.01 |
ML-KEM-1024 decaps |
99996 cycles |
98796 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
13153 cycles |
12768 cycles |
1.03 |
ML-KEM-512 encaps |
14346 cycles |
14254 cycles |
1.01 |
ML-KEM-512 decaps |
19181 cycles |
19131 cycles |
1.00 |
ML-KEM-768 keypair |
22698 cycles |
22414 cycles |
1.01 |
ML-KEM-768 encaps |
23090 cycles |
23041 cycles |
1.00 |
ML-KEM-768 decaps |
30119 cycles |
30089 cycles |
1.00 |
ML-KEM-1024 keypair |
34312 cycles |
33024 cycles |
1.04 |
ML-KEM-1024 encaps |
33755 cycles |
33006 cycles |
1.02 |
ML-KEM-1024 decaps |
49559 cycles |
42430 cycles |
1.17 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
13153 cycles |
12768 cycles |
1.03 |
ML-KEM-1024 keypair |
34312 cycles |
33024 cycles |
1.04 |
ML-KEM-1024 decaps |
49559 cycles |
42430 cycles |
1.17 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17529 cycles |
17485 cycles |
1.00 |
ML-KEM-512 encaps |
19911 cycles |
19875 cycles |
1.00 |
ML-KEM-512 decaps |
26540 cycles |
26415 cycles |
1.00 |
ML-KEM-768 keypair |
30128 cycles |
31874 cycles |
0.95 |
ML-KEM-768 encaps |
31845 cycles |
31109 cycles |
1.02 |
ML-KEM-768 decaps |
42575 cycles |
41545 cycles |
1.02 |
ML-KEM-1024 keypair |
44864 cycles |
46137 cycles |
0.97 |
ML-KEM-1024 encaps |
44752 cycles |
45137 cycles |
0.99 |
ML-KEM-1024 decaps |
64002 cycles |
58253 cycles |
1.10 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-1024 decaps |
64002 cycles |
58253 cycles |
1.10 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
39951 cycles |
40253 cycles |
0.99 |
ML-KEM-512 encaps |
47767 cycles |
48411 cycles |
0.99 |
ML-KEM-512 decaps |
61840 cycles |
62600 cycles |
0.99 |
ML-KEM-768 keypair |
63674 cycles |
63756 cycles |
1.00 |
ML-KEM-768 encaps |
74641 cycles |
74947 cycles |
1.00 |
ML-KEM-768 decaps |
93375 cycles |
93618 cycles |
1.00 |
ML-KEM-1024 keypair |
95372 cycles |
94982 cycles |
1.00 |
ML-KEM-1024 encaps |
109781 cycles |
109167 cycles |
1.01 |
ML-KEM-1024 decaps |
132330 cycles |
131931 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
36701 cycles |
36584 cycles |
1.00 |
ML-KEM-512 encaps |
43152 cycles |
43043 cycles |
1.00 |
ML-KEM-512 decaps |
55795 cycles |
55694 cycles |
1.00 |
ML-KEM-768 keypair |
58743 cycles |
58618 cycles |
1.00 |
ML-KEM-768 encaps |
67609 cycles |
67618 cycles |
1.00 |
ML-KEM-768 decaps |
84501 cycles |
84427 cycles |
1.00 |
ML-KEM-1024 keypair |
89062 cycles |
88963 cycles |
1.00 |
ML-KEM-1024 encaps |
99756 cycles |
99133 cycles |
1.01 |
ML-KEM-1024 decaps |
121245 cycles |
120720 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28275 cycles |
28233 cycles |
1.00 |
ML-KEM-512 encaps |
34131 cycles |
34105 cycles |
1.00 |
ML-KEM-512 decaps |
44304 cycles |
44329 cycles |
1.00 |
ML-KEM-768 keypair |
47642 cycles |
47627 cycles |
1.00 |
ML-KEM-768 encaps |
53960 cycles |
53956 cycles |
1.00 |
ML-KEM-768 decaps |
68365 cycles |
68377 cycles |
1.00 |
ML-KEM-1024 keypair |
70313 cycles |
70255 cycles |
1.00 |
ML-KEM-1024 encaps |
78726 cycles |
78806 cycles |
1.00 |
ML-KEM-1024 decaps |
98488 cycles |
98442 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17680 cycles |
17650 cycles |
1.00 |
ML-KEM-512 encaps |
20631 cycles |
20601 cycles |
1.00 |
ML-KEM-512 decaps |
27073 cycles |
27068 cycles |
1.00 |
ML-KEM-768 keypair |
29910 cycles |
29899 cycles |
1.00 |
ML-KEM-768 encaps |
32727 cycles |
32776 cycles |
1.00 |
ML-KEM-768 decaps |
41962 cycles |
41967 cycles |
1.00 |
ML-KEM-1024 keypair |
43761 cycles |
43750 cycles |
1.00 |
ML-KEM-1024 encaps |
48733 cycles |
48727 cycles |
1.00 |
ML-KEM-1024 decaps |
61415 cycles |
61390 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
46111 cycles |
45656 cycles |
1.01 |
ML-KEM-512 encaps |
55139 cycles |
54451 cycles |
1.01 |
ML-KEM-512 decaps |
70721 cycles |
69753 cycles |
1.01 |
ML-KEM-768 keypair |
73470 cycles |
74171 cycles |
0.99 |
ML-KEM-768 encaps |
85413 cycles |
85948 cycles |
0.99 |
ML-KEM-768 decaps |
105843 cycles |
106520 cycles |
0.99 |
ML-KEM-1024 keypair |
111019 cycles |
112098 cycles |
0.99 |
ML-KEM-1024 encaps |
125122 cycles |
124601 cycles |
1.00 |
ML-KEM-1024 decaps |
151053 cycles |
150531 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
35447 cycles |
35416 cycles |
1.00 |
ML-KEM-512 encaps |
40331 cycles |
40122 cycles |
1.01 |
ML-KEM-512 decaps |
51344 cycles |
51145 cycles |
1.00 |
ML-KEM-768 keypair |
57094 cycles |
56670 cycles |
1.01 |
ML-KEM-768 encaps |
64575 cycles |
65151 cycles |
0.99 |
ML-KEM-768 decaps |
78817 cycles |
79295 cycles |
0.99 |
ML-KEM-1024 keypair |
88008 cycles |
87866 cycles |
1.00 |
ML-KEM-1024 encaps |
96932 cycles |
96879 cycles |
1.00 |
ML-KEM-1024 decaps |
115797 cycles |
115822 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
18658 cycles |
18639 cycles |
1.00 |
ML-KEM-512 encaps |
21870 cycles |
21876 cycles |
1.00 |
ML-KEM-512 decaps |
28837 cycles |
28861 cycles |
1.00 |
ML-KEM-768 keypair |
31600 cycles |
31540 cycles |
1.00 |
ML-KEM-768 encaps |
34735 cycles |
34771 cycles |
1.00 |
ML-KEM-768 decaps |
44751 cycles |
44775 cycles |
1.00 |
ML-KEM-1024 keypair |
46031 cycles |
46082 cycles |
1.00 |
ML-KEM-1024 encaps |
51541 cycles |
51501 cycles |
1.00 |
ML-KEM-1024 decaps |
65106 cycles |
65036 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28320 cycles |
28256 cycles |
1.00 |
ML-KEM-512 encaps |
34158 cycles |
34110 cycles |
1.00 |
ML-KEM-512 decaps |
44412 cycles |
44398 cycles |
1.00 |
ML-KEM-768 keypair |
47625 cycles |
47665 cycles |
1.00 |
ML-KEM-768 encaps |
53977 cycles |
53940 cycles |
1.00 |
ML-KEM-768 decaps |
68358 cycles |
68363 cycles |
1.00 |
ML-KEM-1024 keypair |
70289 cycles |
70328 cycles |
1.00 |
ML-KEM-1024 encaps |
78826 cycles |
78757 cycles |
1.00 |
ML-KEM-1024 decaps |
98493 cycles |
98529 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
39109 cycles |
38887 cycles |
1.01 |
ML-KEM-512 encaps |
44790 cycles |
44596 cycles |
1.00 |
ML-KEM-512 decaps |
56879 cycles |
56673 cycles |
1.00 |
ML-KEM-768 keypair |
62450 cycles |
62294 cycles |
1.00 |
ML-KEM-768 encaps |
71104 cycles |
72330 cycles |
0.98 |
ML-KEM-768 decaps |
86786 cycles |
87696 cycles |
0.99 |
ML-KEM-1024 keypair |
96321 cycles |
96160 cycles |
1.00 |
ML-KEM-1024 encaps |
106404 cycles |
106135 cycles |
1.00 |
ML-KEM-1024 decaps |
126710 cycles |
126583 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59172 cycles |
59049 cycles |
1.00 |
ML-KEM-512 encaps |
68760 cycles |
68578 cycles |
1.00 |
ML-KEM-512 decaps |
87506 cycles |
87314 cycles |
1.00 |
ML-KEM-768 keypair |
95700 cycles |
95479 cycles |
1.00 |
ML-KEM-768 encaps |
110650 cycles |
109908 cycles |
1.01 |
ML-KEM-768 decaps |
134421 cycles |
134361 cycles |
1.00 |
ML-KEM-1024 keypair |
147695 cycles |
147876 cycles |
1.00 |
ML-KEM-1024 encaps |
163732 cycles |
163805 cycles |
1.00 |
ML-KEM-1024 decaps |
195466 cycles |
195456 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
CBMC Results (ML-KEM-512)
Full Results (191 proofs)
|
CBMC Results (ML-KEM-768)
Full Results (191 proofs)
|
CBMC Results (ML-KEM-1024)
Full Results (191 proofs)
|
hanno-becker
left a comment
There was a problem hiding this comment.
What's the purpose of 0a01cc4? Tests also serve as documentation, and using internal constants rather than public ones sets a wrong example.
If this is needed, can it be done in a preparatory PR? It seems unrelated to this PR.
The main question here is if we want to add the new API in mlkem_native.h or not. If we don't, we can't test the API in the standard test_mlkem.c, but we could add it in a separate test that includes kem.h, but not mlkem_native.h. I agree with you that we don't want to keep it as is right now. |
|
Seeing that you also observed a slowdown on x86, I wonder if we should treat the incremental API as internal by default and only expose it in the public API if some new option |
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Change mlk_kem_enc_derand_u and mlk_kem_enc_v from MLK_INTERNAL_API to MLK_EXTERNAL_API so they are not static in monolithic builds. Add -Wno-unused-function to the monolithic_build_multilevel_native example (matching mldsa-native) since those examples don't exercise the incremental API. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
907613d to
0a0a167
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i) (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
29373 cycles |
28131 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton4 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 0a0a167 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 encaps |
41585 cycles |
40122 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton3 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: ebce63b | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-512 encaps |
46759 cycles |
44596 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: ec9a558 | Previous: 712709d | Ratio |
|---|---|---|---|
ML-KEM-1024 decaps |
41455 cycles |
39591 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Split ML-KEM encapsulation into two phases (mlk_kem_enc_derand_u / mlk_kem_enc_v) to support protocols like Braid that need to interleave encapsulation with other operations between computing the u- and v-components of the ciphertext. The first phase only requires the public seed and H(pk), not the full public key vector. Internally, K-PKE.Encrypt is refactored into mlk_indcpa_enc_u + mlk_indcpa_enc_v. The non-incremental KEM path calls mlk_indcpa_enc directly to avoid serialization overhead. The intermediate noise polynomial epp is serialized as 4-bit nibbles (128 bytes) - this is primarily done to not require a pre-condition on the allowed values.