Sign: Unpack `s1`, `s2`, and `t0` on the fly in `REDUCE_RAM` mode by mkannwischer · Pull Request #1002 · pq-code-package/mldsa-native

mkannwischer · 2026-03-22T03:25:51Z

Introduce mld_s1vec, mld_s2vec, and mld_t0vec, following the same pattern as mld_polymat for reduced RAM usage. In normal mode, they store the full NTT'd polyvec. In REDUCE_RAM mode, they store a pointer to the packed data in the secret key and unpack + NTT individual polynomials on demand.

This reduces signing memory allocation in REDUCE_RAM mode:

ML-DSA-44: 32,448 -> 20,256 (-37.6%)
ML-DSA-65: 44,768 -> 27,456 (-38.6%)
ML-DSA-87: 59,104 -> 35,648 (-39.7%)

TODO:

CBMC proofs for non-reduced-RAM mode
Avoid double unpacking of s1/s2/t0 in pk_from_sk

Introduce mld_s1vec, following the same pattern as mld_polymat for reduced RAM usage. In normal mode, it stores the full NTT'd polyvecl. In REDUCE_RAM mode, it stores a pointer to the packed s1 data in the secret key and unpacks + NTTs individual polynomials on demand. This reduces signing memory in REDUCE_RAM mode: - ML-DSA-44: 32,448 -> 28,384 (-4,064 bytes) - ML-DSA-65: 44,768 -> 39,680 (-5,088 bytes) - ML-DSA-87: 59,104 -> 51,968 (-7,136 bytes) Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

oqs-bot

Intel Xeon 4th gen (c7i)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`34362` cycles	`34508` cycles	`1.00`
`ML-DSA-44 sign`	`120129` cycles	`119762` cycles	`1.00`
`ML-DSA-44 verify`	`38068` cycles	`38106` cycles	`1.00`
`ML-DSA-65 keypair`	`61396` cycles	`61327` cycles	`1.00`
`ML-DSA-65 sign`	`201975` cycles	`202109` cycles	`1.00`
`ML-DSA-65 verify`	`62883` cycles	`62771` cycles	`1.00`
`ML-DSA-87 keypair`	`93605` cycles	`94593` cycles	`0.99`
`ML-DSA-87 sign`	`238491` cycles	`240827` cycles	`0.99`
`ML-DSA-87 verify`	`95314` cycles	`96019` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 4th gen (c7i) (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`94456` cycles	`93753` cycles	`1.01`
`ML-DSA-44 sign`	`332649` cycles	`333304` cycles	`1.00`
`ML-DSA-44 verify`	`99711` cycles	`99738` cycles	`1.00`
`ML-DSA-65 keypair`	`160279` cycles	`159678` cycles	`1.00`
`ML-DSA-65 sign`	`543374` cycles	`544024` cycles	`1.00`
`ML-DSA-65 verify`	`161092` cycles	`160787` cycles	`1.00`
`ML-DSA-87 keypair`	`267434` cycles	`267177` cycles	`1.00`
`ML-DSA-87 sign`	`709085` cycles	`705890` cycles	`1.00`
`ML-DSA-87 verify`	`270229` cycles	`270246` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`113162` cycles	`113125` cycles	`1.00`
`ML-DSA-44 sign`	`357353` cycles	`355404` cycles	`1.01`
`ML-DSA-44 verify`	`117895` cycles	`117806` cycles	`1.00`
`ML-DSA-65 keypair`	`196212` cycles	`196440` cycles	`1.00`
`ML-DSA-65 sign`	`592734` cycles	`588870` cycles	`1.01`
`ML-DSA-65 verify`	`194625` cycles	`194523` cycles	`1.00`
`ML-DSA-87 keypair`	`322467` cycles	`322254` cycles	`1.00`
`ML-DSA-87 sign`	`756570` cycles	`752961` cycles	`1.00`
`ML-DSA-87 verify`	`320130` cycles	`320091` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`69194` cycles	`69272` cycles	`1.00`
`ML-DSA-44 sign`	`188252` cycles	`188132` cycles	`1.00`
`ML-DSA-44 verify`	`68906` cycles	`69431` cycles	`0.99`
`ML-DSA-65 keypair`	`119088` cycles	`119537` cycles	`1.00`
`ML-DSA-65 sign`	`302886` cycles	`300738` cycles	`1.01`
`ML-DSA-65 verify`	`115778` cycles	`115521` cycles	`1.00`
`ML-DSA-87 keypair`	`203113` cycles	`204457` cycles	`0.99`
`ML-DSA-87 sign`	`408515` cycles	`395562` cycles	`1.03`
`ML-DSA-87 verify`	`195375` cycles	`196251` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`68176` cycles	`68092` cycles	`1.00`
`ML-DSA-44 sign`	`203637` cycles	`202357` cycles	`1.01`
`ML-DSA-44 verify`	`70924` cycles	`70840` cycles	`1.00`
`ML-DSA-65 keypair`	`121292` cycles	`120892` cycles	`1.00`
`ML-DSA-65 sign`	`334250` cycles	`332262` cycles	`1.01`
`ML-DSA-65 verify`	`118051` cycles	`117993` cycles	`1.00`
`ML-DSA-87 keypair`	`198088` cycles	`198285` cycles	`1.00`
`ML-DSA-87 sign`	`431278` cycles	`428165` cycles	`1.01`
`ML-DSA-87 verify`	`194701` cycles	`194638` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`56406` cycles	`56810` cycles	`0.99`
`ML-DSA-44 sign`	`181106` cycles	`181256` cycles	`1.00`
`ML-DSA-44 verify`	`60974` cycles	`61127` cycles	`1.00`
`ML-DSA-65 keypair`	`99070` cycles	`98683` cycles	`1.00`
`ML-DSA-65 sign`	`301150` cycles	`298776` cycles	`1.01`
`ML-DSA-65 verify`	`100716` cycles	`100109` cycles	`1.01`
`ML-DSA-87 keypair`	`152856` cycles	`152672` cycles	`1.00`
`ML-DSA-87 sign`	`358498` cycles	`355205` cycles	`1.01`
`ML-DSA-87 verify`	`153513` cycles	`153314` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a) (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`134975` cycles	`135154` cycles	`1.00`
`ML-DSA-44 sign`	`527071` cycles	`524730` cycles	`1.00`
`ML-DSA-44 verify`	`147865` cycles	`147590` cycles	`1.00`
`ML-DSA-65 keypair`	`228558` cycles	`228675` cycles	`1.00`
`ML-DSA-65 sign`	`865183` cycles	`866364` cycles	`1.00`
`ML-DSA-65 verify`	`236295` cycles	`236755` cycles	`1.00`
`ML-DSA-87 keypair`	`371936` cycles	`372434` cycles	`1.00`
`ML-DSA-87 sign`	`1079730` cycles	`1081953` cycles	`1.00`
`ML-DSA-87 verify`	`383598` cycles	`383807` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`40542` cycles	`41153` cycles	`0.99`
`ML-DSA-44 sign`	`132746` cycles	`132931` cycles	`1.00`
`ML-DSA-44 verify`	`43385` cycles	`43836` cycles	`0.99`
`ML-DSA-65 keypair`	`71975` cycles	`72244` cycles	`1.00`
`ML-DSA-65 sign`	`214826` cycles	`214745` cycles	`1.00`
`ML-DSA-65 verify`	`72365` cycles	`73096` cycles	`0.99`
`ML-DSA-87 keypair`	`108760` cycles	`108337` cycles	`1.00`
`ML-DSA-87 sign`	`254141` cycles	`253357` cycles	`1.00`
`ML-DSA-87 verify`	`111590` cycles	`110812` cycles	`1.01`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4 (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`128366` cycles	`128232` cycles	`1.00`
`ML-DSA-44 sign`	`448227` cycles	`447685` cycles	`1.00`
`ML-DSA-44 verify`	`138227` cycles	`144647` cycles	`0.96`
`ML-DSA-65 keypair`	`220728` cycles	`220666` cycles	`1.00`
`ML-DSA-65 sign`	`728968` cycles	`727390` cycles	`1.00`
`ML-DSA-65 verify`	`222560` cycles	`223179` cycles	`1.00`
`ML-DSA-87 keypair`	`364646` cycles	`365048` cycles	`1.00`
`ML-DSA-87 sign`	`926537` cycles	`925897` cycles	`1.00`
`ML-DSA-87 verify`	`372923` cycles	`372806` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`212388` cycles	`212677` cycles	`1.00`
`ML-DSA-44 sign`	`761595` cycles	`759475` cycles	`1.00`
`ML-DSA-44 verify`	`228709` cycles	`228953` cycles	`1.00`
`ML-DSA-65 keypair`	`379903` cycles	`380253` cycles	`1.00`
`ML-DSA-65 sign`	`1257270` cycles	`1251269` cycles	`1.00`
`ML-DSA-65 verify`	`371502` cycles	`372050` cycles	`1.00`
`ML-DSA-87 keypair`	`604456` cycles	`605509` cycles	`1.00`
`ML-DSA-87 sign`	`1598919` cycles	`1591320` cycles	`1.00`
`ML-DSA-87 verify`	`618280` cycles	`617579` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`72371` cycles	`72262` cycles	`1.00`
`ML-DSA-44 sign`	`213687` cycles	`212358` cycles	`1.01`
`ML-DSA-44 verify`	`75757` cycles	`75722` cycles	`1.00`
`ML-DSA-65 keypair`	`127613` cycles	`127611` cycles	`1.00`
`ML-DSA-65 sign`	`353316` cycles	`350840` cycles	`1.01`
`ML-DSA-65 verify`	`125589` cycles	`125699` cycles	`1.00`
`ML-DSA-87 keypair`	`205884` cycles	`208501` cycles	`0.99`
`ML-DSA-87 sign`	`447609` cycles	`450025` cycles	`0.99`
`ML-DSA-87 verify`	`205683` cycles	`205765` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i) (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`157318` cycles	`157591` cycles	`1.00`
`ML-DSA-44 sign`	`549881` cycles	`551560` cycles	`1.00`
`ML-DSA-44 verify`	`169319` cycles	`169402` cycles	`1.00`
`ML-DSA-65 keypair`	`268418` cycles	`267815` cycles	`1.00`
`ML-DSA-65 sign`	`906561` cycles	`904542` cycles	`1.00`
`ML-DSA-65 verify`	`274795` cycles	`274303` cycles	`1.00`
`ML-DSA-87 keypair`	`447731` cycles	`448249` cycles	`1.00`
`ML-DSA-87 sign`	`1161252` cycles	`1156908` cycles	`1.00`
`ML-DSA-87 verify`	`457349` cycles	`458389` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a) (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`120710` cycles	`120615` cycles	`1.00`
`ML-DSA-44 sign`	`448467` cycles	`447589` cycles	`1.00`
`ML-DSA-44 verify`	`130598` cycles	`130296` cycles	`1.00`
`ML-DSA-65 keypair`	`204115` cycles	`204314` cycles	`1.00`
`ML-DSA-65 sign`	`729459` cycles	`728144` cycles	`1.00`
`ML-DSA-65 verify`	`210276` cycles	`210151` cycles	`1.00`
`ML-DSA-87 keypair`	`337081` cycles	`338739` cycles	`1.00`
`ML-DSA-87 sign`	`927755` cycles	`924086` cycles	`1.00`
`ML-DSA-87 verify`	`347169` cycles	`347015` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3 (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`138670` cycles	`138488` cycles	`1.00`
`ML-DSA-44 sign`	`484824` cycles	`483902` cycles	`1.00`
`ML-DSA-44 verify`	`148462` cycles	`162298` cycles	`0.91`
`ML-DSA-65 keypair`	`241326` cycles	`241720` cycles	`1.00`
`ML-DSA-65 sign`	`794542` cycles	`792693` cycles	`1.00`
`ML-DSA-65 verify`	`240735` cycles	`241300` cycles	`1.00`
`ML-DSA-87 keypair`	`395465` cycles	`396574` cycles	`1.00`
`ML-DSA-87 sign`	`1016682` cycles	`1012397` cycles	`1.00`
`ML-DSA-87 verify`	`402879` cycles	`402619` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton2

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`113678` cycles	`113486` cycles	`1.00`
`ML-DSA-44 sign`	`358177` cycles	`355929` cycles	`1.01`
`ML-DSA-44 verify`	`118427` cycles	`118313` cycles	`1.00`
`ML-DSA-65 keypair`	`196712` cycles	`196525` cycles	`1.00`
`ML-DSA-65 sign`	`592715` cycles	`588739` cycles	`1.01`
`ML-DSA-65 verify`	`194904` cycles	`194868` cycles	`1.00`
`ML-DSA-87 keypair`	`322556` cycles	`323107` cycles	`1.00`
`ML-DSA-87 sign`	`757645` cycles	`753767` cycles	`1.01`
`ML-DSA-87 verify`	`320481` cycles	`320405` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

Same pattern as mld_s1vec: in normal mode stores the full NTT'd polyveck, in REDUCE_RAM mode stores a pointer and unpacks + NTTs on demand. REDUCE_RAM signing memory reduction: - ML-DSA-44: 28,384 -> 24,320 (-4,064 bytes) - ML-DSA-65: 39,680 -> 33,568 (-6,112 bytes) - ML-DSA-87: 51,968 -> 43,808 (-8,160 bytes) Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

oqs-bot

Graviton2 (no-opt)

Details

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 keypair`	`213320` cycles	`212744` cycles	`1.00`
`ML-DSA-44 sign`	`762194` cycles	`760342` cycles	`1.00`
`ML-DSA-44 verify`	`241436` cycles	`234472` cycles	`1.03`
`ML-DSA-65 keypair`	`380938` cycles	`380565` cycles	`1.00`
`ML-DSA-65 sign`	`1259322` cycles	`1254252` cycles	`1.00`
`ML-DSA-65 verify`	`372512` cycles	`372074` cycles	`1.00`
`ML-DSA-87 keypair`	`606252` cycles	`604302` cycles	`1.00`
`ML-DSA-87 sign`	`1597536` cycles	`1594512` cycles	`1.00`
`ML-DSA-87 verify`	`618242` cycles	`618492` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite	Current: `c0ac5e8`	Previous: `bb07ee8`	Ratio
`ML-DSA-44 verify`	`241913` cycles	`234472` cycles	`1.03`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot · 2026-03-22T03:43:43Z

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof	Status	Current	Previous	Change
`sign_verify_internal`	⚠️	218s	126s	+73%

Full Results (178 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	2134s	1998s	+6.8%
`polyvecl_pointwise_acc_montgomery_c`	✅	222s	208s	+7%
`sign_verify_internal`	⚠️	218s	126s	+73%
`mld_attempt_signature_generation`	✅	207s	231s	-10%
`poly_pointwise_montgomery_c`	✅	162s	152s	+7%
`rej_uniform_native`	✅	145s	146s	-1%
`mld_invntt_layer`	✅	90s	88s	+2%
`mld_ct_memcmp`	✅	72s	77s	-6%
`mld_ntt_layer`	✅	55s	59s	-7%
`keccak_squeezeblocks_x4`	✅	43s	42s	+2%
`polyvec_matrix_expand`	✅	28s	28s	+0%
`rej_uniform`	✅	23s	22s	+5%
`polymat_permute_bitrev_to_custom`	✅	22s	15s	+47%
`poly_chknorm_c`	✅	21s	19s	+11%
`fqmul`	✅	20s	20s	+0%
`sign_signature_internal`	✅	20s	31s	-35%
`poly_uniform_eta_4x`	✅	19s	17s	+12%
`polyeta_unpack`	✅	18s	16s	+12%
`mld_compute_t0_t1_tr_from_sk_components`	✅	15s	14s	+7%
`rej_uniform_c`	✅	15s	14s	+7%
`mld_ntt_butterfly_block`	✅	14s	12s	+17%
`poly_add`	✅	14s	9s	+56%
`polyt0_unpack`	✅	14s	14s	+0%
`polyz_unpack_c`	✅	14s	11s	+27%
`keccakf1600x4_permute_native`	✅	13s	13s	+0%
`mld_check_pct`	✅	13s	5s	+160%
`poly_uniform_4x`	✅	13s	14s	-7%
`polyveck_power2round`	✅	13s	12s	+8%
`polyvec_matrix_expand_serial`	✅	12s	13s	-8%
`polyvec_matrix_pointwise_montgomery`	✅	12s	13s	-8%
`sign_pk_from_sk`	✅	12s	7s	+71%
`keccak_absorb_once_x4`	✅	11s	10s	+10%
`keccakf1600_permute`	✅	10s	7s	+43%
`mld_h`	✅	9s	6s	+50%
`polyveck_add`	✅	9s	9s	+0%
`keccak_absorb`	✅	8s	9s	-11%
`keccakf1600_permute_native`	✅	8s	7s	+14%
`mld_polyvecl_permute_bitrev_to_custom_native`	✅	8s	7s	+14%
`mld_compute_pack_z`	✅	7s	7s	+0%
`polyveck_ntt`	✅	7s	6s	+17%
`polyveck_pointwise_poly_montgomery_t0`	✅	7s	-	new
`polyveck_use_hint`	✅	7s	9s	-22%
`sign_signature`	✅	7s	5s	+40%
`poly_invntt_tomont`	✅	6s	3s	+100%
`poly_invntt_tomont_c`	✅	6s	5s	+20%
`poly_sub`	✅	6s	3s	+100%
`poly_uniform`	✅	6s	3s	+100%
`poly_uniform_gamma1_4x`	✅	6s	3s	+100%
`polyveck_decompose`	✅	6s	5s	+20%
`polyveck_make_hint`	✅	6s	4s	+50%
`polyveck_shiftl`	✅	6s	4s	+50%
`polyvecl_unpack_z`	✅	6s	4s	+50%
`rej_eta_native`	✅	6s	5s	+20%
`sign`	✅	6s	7s	-14%
`sign_open`	✅	6s	5s	+20%
`unpack_sk`	✅	6s	5s	+20%
`intt_native_x86_64`	✅	5s	3s	+67%
`make_hint`	✅	5s	3s	+67%
`mld_sample_s1_s2`	✅	5s	2s	+150%
`poly_chknorm`	✅	5s	4s	+25%
`polyveck_caddq`	✅	5s	3s	+67%
`polyveck_reduce`	✅	5s	6s	-17%
`polyvecl_chknorm`	✅	5s	4s	+25%
`polyvecl_unpack_eta`	✅	5s	3s	+67%
`polyz_unpack`	✅	5s	2s	+150%
`shake128_squeeze`	✅	5s	3s	+67%
`sign_keypair_internal`	✅	5s	9s	-44%
`sign_verify_extmu`	✅	5s	5s	+0%
`sign_verify_pre_hash_internal`	✅	5s	4s	+25%
`sys_check_capability`	✅	5s	2s	+150%
`unpack_hints`	✅	5s	8s	-38%
`decompose`	✅	4s	4s	+0%
`fqscale`	✅	4s	4s	+0%
`keccakf1600_extract_bytes (big endian)`	✅	4s	2s	+100%
`keccakf1600x4_xor_bytes`	✅	4s	4s	+0%
`mld_ct_cmask_nonzero_u32`	✅	4s	4s	+0%
`mld_keccakf1600_extract_bytes`	✅	4s	3s	+33%
`mld_value_barrier_u8`	✅	4s	1s	+300%
`ntt_native_x86_64`	✅	4s	4s	+0%
`pack_pk`	✅	4s	3s	+33%
`poly_caddq_c`	✅	4s	4s	+0%
`poly_caddq_native_aarch64`	✅	4s	6s	-33%
`poly_challenge`	✅	4s	4s	+0%
`poly_decompose`	✅	4s	2s	+100%
`poly_decompose_c`	✅	4s	2s	+100%
`poly_ntt_native`	✅	4s	3s	+33%
`poly_pointwise_montgomery_native`	✅	4s	2s	+100%
`poly_power2round`	✅	4s	8s	-50%
`poly_reduce`	✅	4s	3s	+33%
`poly_use_hint_c`	✅	4s	5s	-20%
`polyeta_pack`	✅	4s	4s	+0%
`polyt1_unpack`	✅	4s	4s	+0%
`polyveck_chknorm`	✅	4s	4s	+0%
`polyveck_pack_w1`	✅	4s	6s	-33%
`polyveck_sub`	✅	4s	5s	-20%
`polyvecl_pack_eta`	✅	4s	2s	+100%
`polyvecl_uniform_gamma1`	✅	4s	3s	+33%
`polyvecl_uniform_gamma1_serial`	✅	4s	2s	+100%
`reduce32`	✅	4s	3s	+33%
`rej_eta_c`	✅	4s	5s	-20%
`shake128x4_squeezeblocks`	✅	4s	2s	+100%
`shake256`	✅	4s	3s	+33%
`shake256_absorb`	✅	4s	6s	-33%
`sign_keypair`	✅	4s	3s	+33%
`sign_verify`	✅	4s	7s	-43%
`unpack_sig`	✅	4s	4s	+0%
`caddq`	✅	3s	3s	+0%
`keccak_init`	✅	3s	2s	+50%
`keccak_squeeze`	✅	3s	3s	+0%
`keccakf1600_xor_bytes (big endian)`	✅	3s	3s	+0%
`keccakf1600x4_extract_bytes`	✅	3s	5s	-40%
`keccakf1600x4_permute`	✅	3s	3s	+0%
`mld_ct_abs_i32`	✅	3s	3s	+0%
`mld_ct_cmask_nonzero_u8`	✅	3s	2s	+50%
`mld_ct_get_optblocker_i64`	✅	3s	2s	+50%
`mld_ct_get_optblocker_u32`	✅	3s	3s	+0%
`mld_sample_s1_s2_serial`	✅	3s	3s	+0%
`mld_value_barrier_i64`	✅	3s	3s	+0%
`mld_value_barrier_u32`	✅	3s	2s	+50%
`montgomery_reduce`	✅	3s	3s	+0%
`pack_sig_c_h`	✅	3s	5s	-40%
`pack_sig_z`	✅	3s	3s	+0%
`poly_caddq`	✅	3s	3s	+0%
`poly_caddq_native`	✅	3s	3s	+0%
`poly_chknorm_native`	✅	3s	2s	+50%
`poly_chknorm_native_aarch64`	✅	3s	5s	-40%
`poly_decompose_native`	✅	3s	3s	+0%
`poly_invntt_tomont_native`	✅	3s	4s	-25%
`poly_ntt`	✅	3s	3s	+0%
`poly_pointwise_montgomery`	✅	3s	4s	-25%
`poly_shiftl`	✅	3s	2s	+50%
`poly_uniform_eta`	✅	3s	3s	+0%
`poly_uniform_gamma1`	✅	3s	2s	+50%
`polyt0_pack`	✅	3s	5s	-40%
`polyveck_pack_eta`	✅	3s	2s	+50%
`polyveck_pointwise_poly_montgomery`	✅	3s	2s	+50%
`polyveck_pointwise_poly_montgomery_s2`	✅	3s	-	new
`polyvecl_ntt`	✅	3s	7s	-57%
`polyvecl_pointwise_acc_montgomery_native`	✅	3s	3s	+0%
`polyw1_pack`	✅	3s	5s	-40%
`polyz_pack`	✅	3s	5s	-40%
`polyz_unpack_native`	✅	3s	3s	+0%
`power2round`	✅	3s	3s	+0%
`rej_eta`	✅	3s	4s	-25%
`shake128_init`	✅	3s	2s	+50%
`shake128x4_absorb_once`	✅	3s	1s	+200%
`shake256_init`	✅	3s	1s	+200%
`shake256_release`	✅	3s	2s	+50%
`sign_signature_pre_hash_shake256`	✅	3s	3s	+0%
`sign_verify_pre_hash_shake256`	✅	3s	5s	-40%
`use_hint`	✅	3s	2s	+50%
`keccak_finalize`	✅	2s	4s	-50%
`keccakf1600_xor_bytes`	✅	2s	1s	+100%
`mld_ct_get_optblocker_u8`	✅	2s	3s	-33%
`mld_prepare_domain_separation_prefix`	✅	2s	7s	-71%
`pack_sk`	✅	2s	3s	-33%
`poly_make_hint`	✅	2s	4s	-50%
`poly_ntt_c`	✅	2s	4s	-50%
`poly_use_hint`	✅	2s	3s	-33%
`poly_use_hint_native`	✅	2s	3s	-33%
`polyt1_pack`	✅	2s	3s	-33%
`polyveck_invntt_tomont`	✅	2s	3s	-33%
`polyveck_pack_t0`	✅	2s	3s	-33%
`polyveck_unpack_eta`	✅	2s	2s	+0%
`polyvecl_permute_bitrev_to_custom`	✅	2s	4s	-50%
`polyvecl_pointwise_acc_montgomery`	✅	2s	2s	+0%
`shake128_absorb`	✅	2s	3s	-33%
`shake128_finalize`	✅	2s	2s	+0%
`shake256x4_absorb_once`	✅	2s	3s	-33%
`sign_signature_extmu`	✅	2s	5s	-60%
`sign_signature_pre_hash_internal`	✅	2s	3s	-33%
`unpack_pk`	✅	2s	3s	-33%
`mld_ct_cmask_neg_i32`	✅	1s	1s	+0%
`mld_ct_sel_int32`	✅	1s	2s	-50%
`polyveck_unpack_t0`	✅	1s	3s	-67%
`shake128_release`	✅	1s	2s	-50%
`shake256_finalize`	✅	1s	3s	-67%
`shake256_squeeze`	✅	1s	2s	-50%
`shake256x4_squeezeblocks`	✅	1s	2s	-50%

Same pattern as mld_s1vec and mld_s2vec: in normal mode stores the full NTT'd polyveck, in REDUCE_RAM mode stores a pointer and unpacks + NTTs on demand. REDUCE_RAM signing memory reduction: - ML-DSA-44: 24,320 -> 20,256 (-4,064 bytes) - ML-DSA-65: 33,568 -> 27,456 (-6,112 bytes) - ML-DSA-87: 43,808 -> 35,648 (-8,160 bytes) Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

oqs-bot · 2026-03-22T04:09:49Z

CBMC Results (ML-DSA-65)

Full Results (178 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	2482s	2485s	-0.1%
`mld_attempt_signature_generation`	✅	287s	275s	+4%
`sign_verify_internal`	✅	274s	336s	-18%
`polyvecl_pointwise_acc_montgomery_c`	✅	196s	192s	+2%
`poly_pointwise_montgomery_c`	✅	168s	153s	+10%
`rej_uniform_native`	✅	148s	148s	+0%
`polyvec_matrix_expand`	✅	120s	126s	-5%
`mld_invntt_layer`	✅	96s	96s	+0%
`mld_ct_memcmp`	✅	80s	76s	+5%
`polyvec_matrix_expand_serial`	✅	68s	68s	+0%
`mld_ntt_layer`	✅	56s	55s	+2%
`keccak_squeezeblocks_x4`	✅	44s	43s	+2%
`mld_compute_t0_t1_tr_from_sk_components`	✅	24s	27s	-11%
`poly_chknorm_c`	✅	24s	21s	+14%
`sign_signature_internal`	✅	24s	39s	-38%
`rej_uniform`	✅	21s	24s	-12%
`fqmul`	✅	18s	21s	-14%
`polymat_permute_bitrev_to_custom`	✅	17s	30s	-43%
`rej_uniform_c`	✅	17s	13s	+31%
`poly_uniform_eta_4x`	✅	16s	16s	+0%
`keccakf1600x4_permute_native`	✅	15s	13s	+15%
`mld_ntt_butterfly_block`	✅	15s	12s	+25%
`poly_add`	✅	15s	12s	+25%
`poly_uniform_4x`	✅	15s	17s	-12%
`polyt0_unpack`	✅	14s	15s	-7%
`polyvec_matrix_pointwise_montgomery`	✅	14s	11s	+27%
`polyveck_decompose`	✅	14s	11s	+27%
`polyveck_shiftl`	✅	12s	7s	+71%
`keccak_absorb_once_x4`	✅	11s	11s	+0%
`polyveck_sub`	✅	11s	11s	+0%
`sign_pk_from_sk`	✅	11s	8s	+38%
`mld_polyvecl_permute_bitrev_to_custom_native`	✅	10s	7s	+43%
`poly_invntt_tomont_c`	✅	10s	6s	+67%
`polyveck_add`	✅	10s	9s	+11%
`polyveck_pointwise_poly_montgomery_t0`	✅	10s	-	new
`polyveck_power2round`	✅	10s	11s	-9%
`polyvecl_chknorm`	✅	10s	11s	-9%
`keccakf1600_permute`	✅	9s	7s	+29%
`polyveck_ntt`	✅	9s	12s	-25%
`polyveck_reduce`	✅	9s	6s	+50%
`polyveck_use_hint`	✅	9s	8s	+12%
`keccakf1600_permute_native`	✅	8s	8s	+0%
`mld_check_pct`	✅	8s	8s	+0%
`poly_decompose_c`	✅	8s	9s	-11%
`unpack_sk`	✅	8s	5s	+60%
`keccak_absorb`	✅	7s	7s	+0%
`mld_compute_pack_z`	✅	7s	6s	+17%
`mld_sample_s1_s2_serial`	✅	7s	5s	+40%
`polyveck_caddq`	✅	7s	8s	-12%
`polyveck_invntt_tomont`	✅	7s	9s	-22%
`polyveck_pointwise_poly_montgomery`	✅	7s	7s	+0%
`sign_signature_pre_hash_shake256`	✅	7s	3s	+133%
`poly_caddq_c`	✅	6s	5s	+20%
`polyeta_unpack`	✅	6s	6s	+0%
`polyveck_unpack_eta`	✅	6s	3s	+100%
`polyvecl_ntt`	✅	6s	7s	-14%
`polyvecl_uniform_gamma1`	✅	6s	4s	+50%
`polyz_pack`	✅	6s	3s	+100%
`sign`	✅	6s	7s	-14%
`keccakf1600x4_xor_bytes`	✅	5s	2s	+150%
`mld_ct_get_optblocker_i64`	✅	5s	5s	+0%
`mld_prepare_domain_separation_prefix`	✅	5s	4s	+25%
`pack_sk`	✅	5s	3s	+67%
`poly_caddq`	✅	5s	3s	+67%
`poly_chknorm`	✅	5s	2s	+150%
`poly_chknorm_native_aarch64`	✅	5s	3s	+67%
`poly_power2round`	✅	5s	5s	+0%
`poly_use_hint_c`	✅	5s	4s	+25%
`polyt0_pack`	✅	5s	3s	+67%
`polyveck_pointwise_poly_montgomery_s2`	✅	5s	-	new
`polyveck_unpack_t0`	✅	5s	6s	-17%
`polyvecl_pointwise_acc_montgomery_native`	✅	5s	4s	+25%
`polyz_unpack`	✅	5s	4s	+25%
`rej_eta`	✅	5s	3s	+67%
`rej_eta_c`	✅	5s	3s	+67%
`shake128_squeeze`	✅	5s	3s	+67%
`sign_keypair`	✅	5s	5s	+0%
`sign_keypair_internal`	✅	5s	5s	+0%
`sign_open`	✅	5s	5s	+0%
`sign_verify`	✅	5s	7s	-29%
`sign_verify_extmu`	✅	5s	4s	+25%
`keccak_init`	✅	4s	3s	+33%
`keccakf1600_xor_bytes`	✅	4s	3s	+33%
`keccakf1600x4_extract_bytes`	✅	4s	2s	+100%
`make_hint`	✅	4s	5s	-20%
`mld_ct_cmask_nonzero_u8`	✅	4s	2s	+100%
`mld_ct_get_optblocker_u32`	✅	4s	2s	+100%
`mld_ct_get_optblocker_u8`	✅	4s	5s	-20%
`mld_h`	✅	4s	3s	+33%
`mld_sample_s1_s2`	✅	4s	8s	-50%
`mld_value_barrier_i64`	✅	4s	3s	+33%
`ntt_native_x86_64`	✅	4s	5s	-20%
`pack_sig_c_h`	✅	4s	4s	+0%
`poly_challenge`	✅	4s	4s	+0%
`poly_chknorm_native`	✅	4s	4s	+0%
`poly_decompose_native`	✅	4s	3s	+33%
`poly_invntt_tomont`	✅	4s	4s	+0%
`poly_ntt_c`	✅	4s	3s	+33%
`poly_ntt_native`	✅	4s	2s	+100%
`poly_pointwise_montgomery`	✅	4s	7s	-43%
`poly_reduce`	✅	4s	4s	+0%
`polyeta_pack`	✅	4s	4s	+0%
`polyt1_unpack`	✅	4s	2s	+100%
`polyveck_chknorm`	✅	4s	3s	+33%
`polyveck_make_hint`	✅	4s	7s	-43%
`polyveck_pack_eta`	✅	4s	2s	+100%
`polyveck_pack_t0`	✅	4s	5s	-20%
`polyveck_pack_w1`	✅	4s	3s	+33%
`polyvecl_uniform_gamma1_serial`	✅	4s	4s	+0%
`polyvecl_unpack_z`	✅	4s	4s	+0%
`polyz_unpack_native`	✅	4s	3s	+33%
`rej_eta_native`	✅	4s	4s	+0%
`shake128_finalize`	✅	4s	3s	+33%
`shake128x4_squeezeblocks`	✅	4s	3s	+33%
`shake256_init`	✅	4s	2s	+100%
`shake256_release`	✅	4s	2s	+100%
`shake256x4_squeezeblocks`	✅	4s	2s	+100%
`sign_signature_extmu`	✅	4s	4s	+0%
`sign_verify_pre_hash_shake256`	✅	4s	2s	+100%
`unpack_hints`	✅	4s	5s	-20%
`use_hint`	✅	4s	3s	+33%
`decompose`	✅	3s	2s	+50%
`intt_native_x86_64`	✅	3s	4s	-25%
`keccak_finalize`	✅	3s	2s	+50%
`keccak_squeeze`	✅	3s	4s	-25%
`keccakf1600_extract_bytes (big endian)`	✅	3s	2s	+50%
`mld_keccakf1600_extract_bytes`	✅	3s	2s	+50%
`montgomery_reduce`	✅	3s	2s	+50%
`pack_pk`	✅	3s	5s	-40%
`pack_sig_z`	✅	3s	4s	-25%
`poly_caddq_native`	✅	3s	2s	+50%
`poly_caddq_native_aarch64`	✅	3s	5s	-40%
`poly_invntt_tomont_native`	✅	3s	3s	+0%
`poly_pointwise_montgomery_native`	✅	3s	4s	-25%
`poly_shiftl`	✅	3s	4s	-25%
`poly_uniform`	✅	3s	4s	-25%
`poly_uniform_eta`	✅	3s	5s	-40%
`poly_uniform_gamma1`	✅	3s	4s	-25%
`poly_uniform_gamma1_4x`	✅	3s	7s	-57%
`poly_use_hint_native`	✅	3s	5s	-40%
`polyt1_pack`	✅	3s	2s	+50%
`polyvecl_permute_bitrev_to_custom`	✅	3s	3s	+0%
`polyvecl_pointwise_acc_montgomery`	✅	3s	3s	+0%
`polyvecl_unpack_eta`	✅	3s	1s	+200%
`polyw1_pack`	✅	3s	3s	+0%
`polyz_unpack_c`	✅	3s	3s	+0%
`shake128_init`	✅	3s	2s	+50%
`shake128_release`	✅	3s	5s	-40%
`shake128x4_absorb_once`	✅	3s	2s	+50%
`shake256_squeeze`	✅	3s	3s	+0%
`shake256x4_absorb_once`	✅	3s	3s	+0%
`sign_signature`	✅	3s	3s	+0%
`sign_signature_pre_hash_internal`	✅	3s	6s	-50%
`sign_verify_pre_hash_internal`	✅	3s	4s	-25%
`sys_check_capability`	✅	3s	2s	+50%
`unpack_pk`	✅	3s	4s	-25%
`caddq`	✅	2s	4s	-50%
`fqscale`	✅	2s	2s	+0%
`keccakf1600_xor_bytes (big endian)`	✅	2s	4s	-50%
`mld_ct_cmask_neg_i32`	✅	2s	1s	+100%
`mld_ct_cmask_nonzero_u32`	✅	2s	3s	-33%
`mld_ct_sel_int32`	✅	2s	3s	-33%
`mld_value_barrier_u32`	✅	2s	2s	+0%
`mld_value_barrier_u8`	✅	2s	2s	+0%
`poly_decompose`	✅	2s	2s	+0%
`poly_make_hint`	✅	2s	5s	-60%
`poly_ntt`	✅	2s	2s	+0%
`poly_sub`	✅	2s	3s	-33%
`poly_use_hint`	✅	2s	3s	-33%
`polyvecl_pack_eta`	✅	2s	4s	-50%
`power2round`	✅	2s	3s	-33%
`shake128_absorb`	✅	2s	1s	+100%
`shake256`	✅	2s	3s	-33%
`shake256_absorb`	✅	2s	1s	+100%
`shake256_finalize`	✅	2s	5s	-60%
`unpack_sig`	✅	2s	4s	-50%
`keccakf1600x4_permute`	✅	1s	3s	-67%
`mld_ct_abs_i32`	✅	1s	2s	-50%
`reduce32`	✅	1s	4s	-75%

oqs-bot · 2026-03-22T04:10:22Z

CBMC Results (ML-DSA-87)

Full Results (178 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	2589s	2658s	-2.6%
`polyvecl_pointwise_acc_montgomery_c`	✅	282s	276s	+2%
`mld_attempt_signature_generation`	✅	261s	237s	+10%
`sign_verify_internal`	✅	251s	330s	-24%
`polyvec_matrix_expand`	✅	161s	177s	-9%
`poly_pointwise_montgomery_c`	✅	157s	153s	+3%
`rej_uniform_native`	✅	141s	140s	+1%
`mld_invntt_layer`	✅	98s	94s	+4%
`polyvec_matrix_expand_serial`	✅	81s	79s	+3%
`mld_ct_memcmp`	✅	77s	73s	+5%
`polyveck_decompose`	✅	57s	56s	+2%
`mld_ntt_layer`	✅	54s	52s	+4%
`sign_signature_internal`	✅	45s	56s	-20%
`keccak_squeezeblocks_x4`	✅	43s	42s	+2%
`polymat_permute_bitrev_to_custom`	✅	27s	45s	-40%
`mld_compute_t0_t1_tr_from_sk_components`	✅	26s	25s	+4%
`poly_chknorm_c`	✅	20s	18s	+11%
`rej_uniform`	✅	20s	21s	-5%
`fqmul`	✅	18s	18s	+0%
`poly_uniform_4x`	✅	18s	14s	+29%
`poly_uniform_eta_4x`	✅	16s	16s	+0%
`polyeta_unpack`	✅	16s	18s	-11%
`keccakf1600x4_permute_native`	✅	15s	15s	+0%
`polyt0_unpack`	✅	14s	14s	+0%
`rej_uniform_c`	✅	14s	12s	+17%
`sign_pk_from_sk`	✅	14s	6s	+133%
`mld_ntt_butterfly_block`	✅	13s	12s	+8%
`polyvec_matrix_pointwise_montgomery`	✅	13s	12s	+8%
`mld_polyvecl_permute_bitrev_to_custom_native`	✅	12s	13s	-8%
`mld_check_pct`	✅	11s	9s	+22%
`poly_add`	✅	11s	13s	-15%
`polyveck_add`	✅	11s	9s	+22%
`polyveck_use_hint`	✅	11s	13s	-15%
`polyvecl_ntt`	✅	11s	9s	+22%
`polyveck_reduce`	✅	10s	10s	+0%
`keccakf1600_permute_native`	✅	9s	9s	+0%
`polyveck_invntt_tomont`	✅	9s	7s	+29%
`polyveck_shiftl`	✅	9s	7s	+29%
`unpack_sk`	✅	9s	5s	+80%
`keccak_absorb_once_x4`	✅	8s	10s	-20%
`keccakf1600_permute`	✅	8s	7s	+14%
`poly_decompose_c`	✅	8s	7s	+14%
`poly_uniform_gamma1_4x`	✅	8s	4s	+100%
`polyveck_caddq`	✅	8s	9s	-11%
`polyveck_power2round`	✅	8s	8s	+0%
`polyz_unpack_c`	✅	8s	7s	+14%
`keccak_absorb`	✅	7s	7s	+0%
`mld_compute_pack_z`	✅	7s	8s	-12%
`mld_sample_s1_s2`	✅	7s	7s	+0%
`poly_caddq_c`	✅	7s	7s	+0%
`polyveck_ntt`	✅	7s	7s	+0%
`polyveck_pointwise_poly_montgomery_t0`	✅	7s	-	new
`polyveck_sub`	✅	7s	6s	+17%
`sign_keypair_internal`	✅	7s	7s	+0%
`mld_sample_s1_s2_serial`	✅	6s	9s	-33%
`poly_chknorm_native_aarch64`	✅	6s	2s	+200%
`poly_invntt_tomont_c`	✅	6s	8s	-25%
`poly_uniform_eta`	✅	6s	5s	+20%
`polyveck_pointwise_poly_montgomery`	✅	6s	7s	-14%
`polyveck_pointwise_poly_montgomery_s2`	✅	6s	-	new
`polyw1_pack`	✅	6s	4s	+50%
`intt_native_x86_64`	✅	5s	4s	+25%
`pack_sig_z`	✅	5s	6s	-17%
`pack_sk`	✅	5s	3s	+67%
`poly_caddq_native_aarch64`	✅	5s	2s	+150%
`poly_challenge`	✅	5s	4s	+25%
`poly_invntt_tomont`	✅	5s	2s	+150%
`poly_invntt_tomont_native`	✅	5s	4s	+25%
`poly_power2round`	✅	5s	5s	+0%
`polyveck_make_hint`	✅	5s	7s	-29%
`polyveck_unpack_eta`	✅	5s	5s	+0%
`polyz_unpack_native`	✅	5s	6s	-17%
`reduce32`	✅	5s	3s	+67%
`sign`	✅	5s	7s	-29%
`sign_signature_pre_hash_shake256`	✅	5s	6s	-17%
`sign_verify_pre_hash_internal`	✅	5s	5s	+0%
`caddq`	✅	4s	4s	+0%
`decompose`	✅	4s	2s	+100%
`keccak_finalize`	✅	4s	2s	+100%
`mld_ct_cmask_nonzero_u8`	✅	4s	5s	-20%
`mld_h`	✅	4s	4s	+0%
`mld_prepare_domain_separation_prefix`	✅	4s	5s	-20%
`mld_value_barrier_u8`	✅	4s	1s	+300%
`poly_chknorm_native`	✅	4s	2s	+100%
`poly_ntt_c`	✅	4s	4s	+0%
`poly_ntt_native`	✅	4s	2s	+100%
`poly_pointwise_montgomery_native`	✅	4s	2s	+100%
`poly_shiftl`	✅	4s	3s	+33%
`poly_uniform_gamma1`	✅	4s	2s	+100%
`polyt0_pack`	✅	4s	4s	+0%
`polyveck_chknorm`	✅	4s	6s	-33%
`polyveck_pack_t0`	✅	4s	3s	+33%
`polyveck_unpack_t0`	✅	4s	6s	-33%
`polyvecl_pack_eta`	✅	4s	4s	+0%
`polyvecl_uniform_gamma1`	✅	4s	4s	+0%
`rej_eta_native`	✅	4s	4s	+0%
`shake128_absorb`	✅	4s	2s	+100%
`shake128_finalize`	✅	4s	3s	+33%
`shake256_init`	✅	4s	1s	+300%
`shake256x4_squeezeblocks`	✅	4s	7s	-43%
`sign_signature`	✅	4s	6s	-33%
`sign_verify`	✅	4s	3s	+33%
`sign_verify_extmu`	✅	4s	5s	-20%
`sys_check_capability`	✅	4s	4s	+0%
`unpack_hints`	✅	4s	5s	-20%
`unpack_pk`	✅	4s	3s	+33%
`keccak_init`	✅	3s	4s	-25%
`keccakf1600_xor_bytes (big endian)`	✅	3s	2s	+50%
`keccakf1600x4_extract_bytes`	✅	3s	2s	+50%
`keccakf1600x4_permute`	✅	3s	3s	+0%
`keccakf1600x4_xor_bytes`	✅	3s	3s	+0%
`make_hint`	✅	3s	3s	+0%
`mld_ct_abs_i32`	✅	3s	2s	+50%
`mld_ct_get_optblocker_i64`	✅	3s	3s	+0%
`mld_ct_get_optblocker_u32`	✅	3s	4s	-25%
`montgomery_reduce`	✅	3s	4s	-25%
`ntt_native_x86_64`	✅	3s	3s	+0%
`pack_pk`	✅	3s	2s	+50%
`poly_caddq`	✅	3s	2s	+50%
`poly_caddq_native`	✅	3s	2s	+50%
`poly_chknorm`	✅	3s	3s	+0%
`poly_pointwise_montgomery`	✅	3s	4s	-25%
`poly_sub`	✅	3s	5s	-40%
`poly_uniform`	✅	3s	4s	-25%
`poly_use_hint`	✅	3s	2s	+50%
`poly_use_hint_c`	✅	3s	3s	+0%
`poly_use_hint_native`	✅	3s	4s	-25%
`polyt1_unpack`	✅	3s	3s	+0%
`polyvecl_permute_bitrev_to_custom`	✅	3s	2s	+50%
`polyvecl_pointwise_acc_montgomery`	✅	3s	4s	-25%
`polyvecl_uniform_gamma1_serial`	✅	3s	4s	-25%
`polyvecl_unpack_eta`	✅	3s	4s	-25%
`polyvecl_unpack_z`	✅	3s	6s	-50%
`polyz_pack`	✅	3s	2s	+50%
`power2round`	✅	3s	2s	+50%
`rej_eta`	✅	3s	6s	-50%
`shake256_absorb`	✅	3s	2s	+50%
`shake256_release`	✅	3s	1s	+200%
`shake256x4_absorb_once`	✅	3s	2s	+50%
`sign_keypair`	✅	3s	3s	+0%
`sign_open`	✅	3s	5s	-40%
`sign_signature_extmu`	✅	3s	3s	+0%
`sign_signature_pre_hash_internal`	✅	3s	3s	+0%
`sign_verify_pre_hash_shake256`	✅	3s	6s	-50%
`use_hint`	✅	3s	3s	+0%
`fqscale`	✅	2s	5s	-60%
`keccak_squeeze`	✅	2s	2s	+0%
`keccakf1600_extract_bytes (big endian)`	✅	2s	2s	+0%
`mld_ct_cmask_neg_i32`	✅	2s	2s	+0%
`mld_ct_cmask_nonzero_u32`	✅	2s	5s	-60%
`mld_ct_get_optblocker_u8`	✅	2s	1s	+100%
`mld_ct_sel_int32`	✅	2s	1s	+100%
`mld_value_barrier_i64`	✅	2s	4s	-50%
`pack_sig_c_h`	✅	2s	3s	-33%
`poly_decompose`	✅	2s	3s	-33%
`poly_decompose_native`	✅	2s	2s	+0%
`poly_make_hint`	✅	2s	4s	-50%
`poly_ntt`	✅	2s	3s	-33%
`poly_reduce`	✅	2s	3s	-33%
`polyeta_pack`	✅	2s	3s	-33%
`polyt1_pack`	✅	2s	3s	-33%
`polyveck_pack_eta`	✅	2s	4s	-50%
`polyvecl_chknorm`	✅	2s	6s	-67%
`polyvecl_pointwise_acc_montgomery_native`	✅	2s	4s	-50%
`polyz_unpack`	✅	2s	4s	-50%
`rej_eta_c`	✅	2s	3s	-33%
`shake256`	✅	2s	2s	+0%
`shake256_finalize`	✅	2s	3s	-33%
`unpack_sig`	✅	2s	5s	-60%
`keccakf1600_xor_bytes`	✅	1s	4s	-75%
`mld_keccakf1600_extract_bytes`	✅	1s	2s	-50%
`mld_value_barrier_u32`	✅	1s	4s	-75%
`polyveck_pack_w1`	✅	1s	2s	-50%
`shake128_init`	✅	1s	2s	-50%
`shake128_release`	✅	1s	3s	-67%
`shake128_squeeze`	✅	1s	2s	-50%
`shake128x4_absorb_once`	✅	1s	4s	-75%
`shake128x4_squeezeblocks`	✅	1s	3s	-67%
`shake256_squeeze`	✅	1s	2s	-50%

mldsa/src/polyvec.h

mkannwischer · 2026-03-22T04:19:51Z

mldsa/src/sign.c

+  /* Unpack s1 again in raw form for norm check and recomputation.
+   * TODO: avoid this double unpacking */
+  mld_polyvecl_unpack_eta(s1_raw, sk + 2 * MLDSA_SEEDBYTES + MLDSA_TRBYTES);
+
+  /* Unpack s2 again in raw form for norm check and recomputation.
+   * TODO: avoid this double unpacking */
+  mld_polyveck_unpack_eta(s2_raw, sk + 2 * MLDSA_SEEDBYTES + MLDSA_TRBYTES +
+                                      MLDSA_L * MLDSA_POLYETA_PACKEDBYTES);
+
+  /* Unpack t0 again in raw form for validation.
+   * TODO: avoid this double unpacking */
+  mld_polyveck_unpack_t0(t0_raw, sk + 2 * MLDSA_SEEDBYTES + MLDSA_TRBYTES +
+                                     MLDSA_L * MLDSA_POLYETA_PACKEDBYTES +
+                                     MLDSA_K * MLDSA_POLYETA_PACKEDBYTES);
+


pk_from_sk gets quite a bit more ugly here, because we need to do bounds check between unpacking and NTTing.
Can't think about a good way right now. Ideas?

oqs-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite	Current: `881c2e4`	Previous: `bb07ee8`	Ratio
`ML-DSA-87 sign`	`408515` cycles	`395562` cycles	`1.03`

This comment was automatically generated by workflow using github-action-benchmark.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

mkannwischer · 2026-03-22T09:55:31Z

@gilles-peskine-arm @waleed-elmelegy-arm, this PR may be of interest to you.

mkannwischer force-pushed the sign-recompute-s1s2t0 branch from 0e1dfbf to c0ac5e8 Compare March 22, 2026 03:29

mkannwischer added the benchmark label Mar 22, 2026

oqs-bot reviewed Mar 22, 2026

View reviewed changes

github-actions bot reviewed Mar 22, 2026

View reviewed changes

oqs-bot reviewed Mar 22, 2026

View reviewed changes

github-actions bot reviewed Mar 22, 2026

View reviewed changes

oqs-bot reviewed Mar 22, 2026

View reviewed changes

mkannwischer commented Mar 22, 2026

View reviewed changes

mldsa/src/polyvec.h Outdated Show resolved Hide resolved

mkannwischer commented Mar 22, 2026

View reviewed changes

mkannwischer added benchmark and removed benchmark labels Mar 22, 2026

oqs-bot reviewed Mar 22, 2026

View reviewed changes

mkannwischer added 4 commits March 22, 2026 16:33

CI: Add OpenTitan integration patch for updated alloc sizes

4ec6ebd

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

fixup: Update REDUCE_RAM KEYPAIR_PCT allocation limits

cff836a

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

CBMC: Update proofs for s1vec/s2vec/t0vec changes

0508823

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

CBMC: Extract per-poly pointwise functions for s2/t0

b3ee120

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

mkannwischer force-pushed the sign-recompute-s1s2t0 branch from 269c1ee to b3ee120 Compare March 22, 2026 09:08

fixup: Fix comments for s1vec/s2vec/t0vec

6893af6

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>

mkannwischer changed the title ~~Sign Memory: Unpack s1, s2, and t0 on the fly in REDUCE_RAM mode~~ Sign: Unpack s1, s2, and t0 on the fly in REDUCE_RAM mode Mar 22, 2026

mkannwischer marked this pull request as ready for review March 22, 2026 10:15

mkannwischer requested a review from a team as a code owner March 22, 2026 10:15

Conversation

mkannwischer commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Intel Xeon 4th gen (c7i)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Intel Xeon 4th gen (c7i) (no-opt)

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton4

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a) (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton4 (no-opt)

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton3

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i) (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a) (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton3 (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton2

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

mkannwischer commented Mar 22, 2026 •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

github-actions bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

github-actions bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot commented Mar 22, 2026 •

edited

Loading

oqs-bot commented Mar 22, 2026 •

edited

Loading

oqs-bot commented Mar 22, 2026 •

edited

Loading