Skip to content

Port Baremetal: Add Cortex-M33 MPS3-AN524 platform support#1579

Merged
mkannwischer merged 2 commits intomainfrom
port-m33-an524
Apr 25, 2026
Merged

Port Baremetal: Add Cortex-M33 MPS3-AN524 platform support#1579
mkannwischer merged 2 commits intomainfrom
port-m33-an524

Conversation

@willieyz
Copy link
Copy Markdown
Contributor

@willieyz willieyz commented Feb 25, 2026

Add bare-metal platform support for ARM Cortex-M33 on MPS3-AN524 FPGA. Works on both QEMU (qemu-system-arm -M mps3-an524) and real hardware.

@willieyz willieyz force-pushed the port-m33-an524 branch 3 times, most recently from a61e82c to bae13fa Compare February 25, 2026 13:21
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Feb 25, 2026

CBMC Results (ML-KEM-512)

Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1321s 1429s -7.6%
mlk_indcpa_keypair_derand 251s 274s -8%
mlk_indcpa_enc 170s 182s -7%
mlk_rej_uniform_c 126s 147s -14%
mlk_polyvec_basemul_acc_montgomery_cached_c 48s 65s -26%
mlk_ntt_layer 31s 40s -22%
poly_ntt_native 30s 29s +3%
mlk_poly_rej_uniform 29s 36s -19%
mlk_keccak_squeezeblocks_x4 26s 29s -10%
mlk_poly_reduce_native 21s 22s -5%
keccakf1600x4_permute_native_x4 17s 18s -6%
mlk_fqmul 17s 17s +0%
mlk_indcpa_dec 17s 16s +6%
mlk_poly_decompress_d4_native 14s 19s -26%
mlk_poly_decompress_d10_native 12s 18s -33%
mlk_polyvec_add 12s 12s +0%
mlk_poly_frombytes_native 10s 10s +0%
mlk_keccak_squeeze_once 9s 8s +12%
mlk_keccak_absorb_once_x4 8s 7s +14%
mlk_ntt_butterfly_block 8s 9s -11%
mlk_poly_frommsg 8s 10s -20%
polyvec_basemul_acc_montgomery_cached_native 8s 8s +0%
mlk_keccak_squeezeblocks 7s 12s -42%
mlk_poly_cbd_eta2 7s 6s +17%
intt_native_aarch64 6s 3s +100%
mlk_keccakf1600_permute_c 6s 4s +50%
mlk_poly_compress_d4_c 6s 3s +100%
mlk_poly_rej_uniform_x4 6s 6s +0%
kem_dec 5s 4s +25%
mlk_invntt_layer 5s 7s -29%
mlk_keypair_getnoise_eta1 5s 3s +67%
mlk_poly_getnoise_eta1122_4x 5s 2s +150%
mlk_poly_ntt 5s 8s -38%
nttunpack_native_x86_64 5s 4s +25%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 5s 3s +67%
rej_uniform_native_x86_64 5s 5s +0%
kem_enc_derand 4s 1s +300%
kem_keypair 4s 2s +100%
mlk_ct_memcmp 4s 2s +100%
mlk_keccakf1600_xor_bytes 4s 2s +100%
mlk_keccakf1600x4_extract_bytes 4s 2s +100%
mlk_keccakf1600x4_extract_bytes_c 4s 1s +300%
mlk_poly_decompress_d5_c 4s 2s +100%
mlk_poly_getnoise_eta2 4s 2s +100%
mlk_poly_tomont_c 4s 3s +33%
mlk_polyvec_frombytes 4s 3s +33%
mlk_polyvec_permute_bitrev_to_custom 4s 3s +33%
mlk_polyvec_permute_bitrev_to_custom_native 4s 2s +100%
mlk_polyvec_tomont 4s 3s +33%
mlk_shake128x4_absorb_once 4s 2s +100%
mlk_shake256x4 4s 4s +0%
mlk_value_barrier_u8 4s 4s +0%
ntt_native_aarch64 4s 2s +100%
poly_decompress_d10_native_x86_64 4s 5s -20%
poly_decompress_d4_native_x86_64 4s 5s -20%
poly_frombytes_native_x86_64 4s 5s -20%
poly_getnoise_eta1122_4x_native 4s 3s +33%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 4s 2s +100%
rej_uniform_native_aarch64 4s 3s +33%
keccak_f1600_x1_native_aarch64 3s 4s -25%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 2s +50%
kem_check_sk 3s 1s +200%
mlk_ct_cmask_neg_i16 3s 4s -25%
mlk_ct_cmask_nonzero_u8 3s 2s +50%
mlk_ct_get_optblocker_i32 3s 2s +50%
mlk_ct_get_optblocker_u32 3s 3s +0%
mlk_enc_getnoise_eta1_eta2 3s 2s +50%
mlk_keccak_absorb_once 3s 6s -50%
mlk_keccakf1600x4_permute 3s 3s +0%
mlk_montgomery_reduce 3s 2s +50%
mlk_poly_cbd_eta1 3s 4s -25%
mlk_poly_compress_d10_c 3s 5s -40%
mlk_poly_compress_du 3s 2s +50%
mlk_poly_compress_dv 3s 2s +50%
mlk_poly_decompress_d10 3s 1s +200%
mlk_poly_decompress_d11 3s 1s +200%
mlk_poly_decompress_d11_c 3s 3s +0%
mlk_poly_decompress_d4 3s 2s +50%
mlk_poly_decompress_d5_native 3s 1s +200%
mlk_poly_frombytes 3s 3s +0%
mlk_poly_invntt_tomont_c 3s 3s +0%
mlk_poly_mulcache_compute_native 3s 2s +50%
mlk_poly_ntt_c 3s 3s +0%
mlk_poly_tomsg 3s 2s +50%
mlk_polyvec_basemul_acc_montgomery_cached 3s 4s -25%
mlk_polyvec_compress_du 3s 3s +0%
mlk_scalar_compress_d1 3s 2s +50%
mlk_scalar_compress_d11 3s 2s +50%
mlk_scalar_decompress_d11 3s 2s +50%
mlk_shake128_absorb_once 3s 3s +0%
poly_compress_d11_native_x86_64 3s 2s +50%
poly_compress_d5_native_x86_64 3s 2s +50%
poly_invntt_tomont_native 3s 1s +200%
poly_mulcache_compute_native_aarch64 3s 3s +0%
poly_mulcache_compute_native_x86_64 3s 3s +0%
poly_tomont_native_x86_64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 3s 3s +0%
intt_native_x86_64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 2s +0%
kem_check_pk 2s 4s -50%
kem_enc 2s 3s -33%
kem_keypair_derand 2s 2s +0%
mlk_check_pct 2s 3s -33%
mlk_ct_cmov_zero 2s 2s +0%
mlk_ct_sel_int16 2s 4s -50%
mlk_ct_sel_uint8 2s 4s -50%
mlk_gen_matrix 2s 3s -33%
mlk_gen_matrix_serial 2s 2s +0%
mlk_keccakf1600_permute 2s 2s +0%
mlk_keccakf1600x4_xor_bytes 2s 3s -33%
mlk_matvec_mul 2s 3s -33%
mlk_poly_compress_d10 2s 3s -33%
mlk_poly_compress_d10_native 2s 2s +0%
mlk_poly_compress_d11_native 2s 3s -33%
mlk_poly_compress_d4 2s 3s -33%
mlk_poly_compress_d4_native 2s 2s +0%
mlk_poly_compress_d5 2s 1s +100%
mlk_poly_compress_d5_c 2s 5s -60%
mlk_poly_compress_d5_native 2s 1s +100%
mlk_poly_decompress_d11_native 2s 1s +100%
mlk_poly_decompress_d4_c 2s 1s +100%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_decompress_du 2s 2s +0%
mlk_poly_decompress_dv 2s 1s +100%
mlk_poly_frombytes_c 2s 2s +0%
mlk_poly_getnoise_eta1_4x_native 2s 3s -33%
mlk_poly_invntt_tomont 2s 4s -50%
mlk_poly_mulcache_compute 2s 1s +100%
mlk_poly_mulcache_compute_c 2s 4s -50%
mlk_poly_reduce 2s 2s +0%
mlk_poly_tobytes 2s 1s +100%
mlk_poly_tobytes_c 2s 2s +0%
mlk_poly_tomont 2s 1s +100%
mlk_poly_tomont_native 2s 4s -50%
mlk_polymat_permute_bitrev_to_custom 2s 3s -33%
mlk_polyvec_decompress_du 2s 2s +0%
mlk_polyvec_mulcache_compute 2s 3s -33%
mlk_polyvec_ntt 2s 3s -33%
mlk_polyvec_reduce 2s 3s -33%
mlk_polyvec_tobytes 2s 3s -33%
mlk_scalar_compress_d10 2s 3s -33%
mlk_scalar_compress_d4 2s 1s +100%
mlk_scalar_compress_d5 2s 3s -33%
mlk_scalar_decompress_d4 2s 3s -33%
mlk_scalar_decompress_d5 2s 1s +100%
mlk_sha3_256 2s 3s -33%
mlk_sha3_512 2s 3s -33%
mlk_shake128_squeezeblocks 2s 1s +100%
mlk_shake256 2s 1s +100%
mlk_value_barrier_i32 2s 3s -33%
ntt_native_x86_64 2s 3s -33%
poly_compress_d4_native_x86_64 2s 3s -33%
poly_decompress_d5_native_x86_64 2s 1s +100%
poly_reduce_native_aarch64 2s 1s +100%
poly_reduce_native_x86_64 2s 4s -50%
poly_tobytes_native_aarch64 2s 3s -33%
poly_tobytes_native_x86_64 2s 2s +0%
poly_tomont_native_aarch64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 1s 2s -50%
keccakf1600_permute_native 1s 2s -50%
mlk_barrett_reduce 1s 3s -67%
mlk_ct_cmask_nonzero_u16 1s 1s +0%
mlk_ct_get_optblocker_u8 1s 1s +0%
mlk_keccakf1600_extract_bytes 1s 2s -50%
mlk_keccakf1600_extract_bytes (big endian) 1s 3s -67%
mlk_keccakf1600_xor_bytes (big endian) 1s 1s +0%
mlk_keccakf1600x4_xor_bytes_c 1s 2s -50%
mlk_poly_add 1s 2s -50%
mlk_poly_compress_d11 1s 2s -50%
mlk_poly_compress_d11_c 1s 2s -50%
mlk_poly_decompress_d10_c 1s 2s -50%
mlk_poly_getnoise_eta1_4x 1s 1s +0%
mlk_poly_reduce_c 1s 1s +0%
mlk_poly_sub 1s 1s +0%
mlk_poly_tobytes_native 1s 3s -67%
mlk_polyvec_invntt_tomont 1s 4s -75%
mlk_rej_uniform 1s 1s +0%
mlk_scalar_decompress_d10 1s 1s +0%
mlk_scalar_signed_to_unsigned_q 1s 2s -50%
mlk_shake128x4_squeezeblocks 1s 3s -67%
mlk_value_barrier_u32 1s 3s -67%
poly_compress_d10_native_x86_64 1s 2s -50%
poly_decompress_d11_native_x86_64 1s 3s -67%
rej_uniform_native 1s 3s -67%
sys_check_capability 1s 4s -75%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Feb 25, 2026

CBMC Results (ML-KEM-768)

Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1268s 1193s +6.3%
mlk_indcpa_keypair_derand 191s 187s +2%
mlk_indcpa_enc 178s 168s +6%
mlk_rej_uniform_c 130s 106s +23%
mlk_polyvec_basemul_acc_montgomery_cached_c 46s 41s +12%
mlk_ntt_layer 35s 27s +30%
mlk_poly_rej_uniform 29s 30s -3%
mlk_keccak_squeezeblocks_x4 28s 24s +17%
poly_ntt_native 23s 21s +10%
mlk_poly_reduce_native 20s 20s +0%
mlk_fqmul 19s 16s +19%
keccakf1600x4_permute_native_x4 18s 20s -10%
polyvec_basemul_acc_montgomery_cached_native 18s 18s +0%
mlk_poly_decompress_d4_native 16s 14s +14%
mlk_poly_decompress_d10_native 13s 12s +8%
mlk_indcpa_dec 12s 10s +20%
mlk_poly_frommsg 10s 9s +11%
mlk_polyvec_add 10s 8s +25%
mlk_keccak_squeezeblocks 8s 7s +14%
mlk_poly_frombytes_native 8s 7s +14%
mlk_poly_rej_uniform_x4 8s 5s +60%
mlk_keccak_squeeze_once 7s 8s -12%
mlk_ntt_butterfly_block 7s 9s -22%
mlk_poly_ntt 7s 7s +0%
kem_dec 6s 5s +20%
mlk_keccak_absorb_once_x4 6s 4s +50%
poly_decompress_d10_native_x86_64 6s 3s +100%
rej_uniform_native_x86_64 6s 5s +20%
mlk_invntt_layer 5s 3s +67%
mlk_keccakf1600_xor_bytes 5s 2s +150%
mlk_poly_compress_d10_c 5s 2s +150%
mlk_polymat_permute_bitrev_to_custom 5s 6s -17%
intt_native_x86_64 4s 2s +100%
mlk_check_pct 4s 3s +33%
mlk_enc_getnoise_eta1_eta2 4s 3s +33%
mlk_gen_matrix 4s 3s +33%
mlk_keccak_absorb_once 4s 4s +0%
mlk_keccakf1600_permute_c 4s 5s -20%
mlk_keccakf1600x4_extract_bytes 4s 2s +100%
mlk_poly_compress_d11_native 4s 3s +33%
mlk_poly_compress_d5 4s 4s +0%
mlk_poly_compress_d5_native 4s 1s +300%
mlk_poly_decompress_d11_native 4s 4s +0%
mlk_poly_decompress_d5_native 4s 5s -20%
mlk_poly_getnoise_eta1_4x 4s 1s +300%
mlk_poly_invntt_tomont_c 4s 2s +100%
mlk_scalar_decompress_d11 4s 5s -20%
mlk_scalar_decompress_d4 4s 2s +100%
poly_decompress_d4_native_x86_64 4s 5s -20%
poly_decompress_d5_native_x86_64 4s 1s +300%
intt_native_aarch64 3s 4s -25%
keccak_f1600_x1_native_aarch64_v84a 3s 1s +200%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 1s +200%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 1s +200%
keccakf1600x4_extract_bytes_native 3s 4s -25%
keccakf1600x4_xor_bytes_native 3s 2s +50%
mlk_ct_cmask_nonzero_u16 3s 3s +0%
mlk_ct_get_optblocker_u8 3s 2s +50%
mlk_ct_memcmp 3s 3s +0%
mlk_keccakf1600_xor_bytes (big endian) 3s 1s +200%
mlk_keccakf1600x4_extract_bytes_c 3s 4s -25%
mlk_keccakf1600x4_xor_bytes_c 3s 2s +50%
mlk_keypair_getnoise_eta1 3s 4s -25%
mlk_poly_compress_d10_native 3s 2s +50%
mlk_poly_compress_d11_c 3s 1s +200%
mlk_poly_decompress_dv 3s 4s -25%
mlk_poly_frombytes 3s 4s -25%
mlk_poly_getnoise_eta1_4x_native 3s 2s +50%
mlk_poly_mulcache_compute_c 3s 3s +0%
mlk_poly_reduce_c 3s 4s -25%
mlk_poly_tobytes_native 3s 4s -25%
mlk_poly_tomont 3s 2s +50%
mlk_poly_tomont_c 3s 2s +50%
mlk_polyvec_basemul_acc_montgomery_cached 3s 2s +50%
mlk_polyvec_compress_du 3s 4s -25%
mlk_polyvec_decompress_du 3s 1s +200%
mlk_polyvec_frombytes 3s 6s -50%
mlk_polyvec_permute_bitrev_to_custom 3s 4s -25%
mlk_polyvec_permute_bitrev_to_custom_native 3s 3s +0%
mlk_polyvec_tomont 3s 2s +50%
mlk_scalar_compress_d4 3s 2s +50%
mlk_scalar_decompress_d5 3s 3s +0%
mlk_scalar_signed_to_unsigned_q 3s 2s +50%
mlk_shake128x4_absorb_once 3s 1s +200%
mlk_value_barrier_u8 3s 1s +200%
ntt_native_aarch64 3s 3s +0%
nttunpack_native_x86_64 3s 3s +0%
poly_compress_d10_native_x86_64 3s 1s +200%
poly_frombytes_native_x86_64 3s 4s -25%
poly_getnoise_eta1122_4x_native 3s 2s +50%
poly_invntt_tomont_native 3s 2s +50%
poly_mulcache_compute_native_aarch64 3s 1s +200%
poly_mulcache_compute_native_x86_64 3s 3s +0%
poly_reduce_native_aarch64 3s 1s +200%
poly_reduce_native_x86_64 3s 3s +0%
poly_tobytes_native_x86_64 3s 3s +0%
poly_tomont_native_aarch64 3s 1s +200%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 2s 1s +100%
keccak_f1600_x4_native_avx2 2s 1s +100%
keccakf1600_permute_native 2s 5s -60%
kem_check_sk 2s 3s -33%
kem_enc 2s 3s -33%
kem_enc_derand 2s 1s +100%
kem_keypair 2s 1s +100%
kem_keypair_derand 2s 2s +0%
mlk_ct_cmask_nonzero_u8 2s 2s +0%
mlk_ct_cmov_zero 2s 2s +0%
mlk_ct_get_optblocker_i32 2s 1s +100%
mlk_ct_sel_int16 2s 4s -50%
mlk_ct_sel_uint8 2s 3s -33%
mlk_gen_matrix_serial 2s 6s -67%
mlk_keccakf1600_extract_bytes 2s 3s -33%
mlk_keccakf1600_extract_bytes (big endian) 2s 2s +0%
mlk_matvec_mul 2s 2s +0%
mlk_montgomery_reduce 2s 2s +0%
mlk_poly_add 2s 1s +100%
mlk_poly_cbd_eta1 2s 2s +0%
mlk_poly_cbd_eta2 2s 5s -60%
mlk_poly_compress_d11 2s 2s +0%
mlk_poly_compress_d4 2s 3s -33%
mlk_poly_compress_d4_c 2s 1s +100%
mlk_poly_compress_d4_native 2s 2s +0%
mlk_poly_compress_d5_c 2s 4s -50%
mlk_poly_compress_du 2s 2s +0%
mlk_poly_compress_dv 2s 2s +0%
mlk_poly_decompress_d10 2s 1s +100%
mlk_poly_decompress_d10_c 2s 3s -33%
mlk_poly_decompress_d11_c 2s 2s +0%
mlk_poly_decompress_d4_c 2s 2s +0%
mlk_poly_frombytes_c 2s 1s +100%
mlk_poly_getnoise_eta1122_4x 2s 3s -33%
mlk_poly_getnoise_eta2 2s 2s +0%
mlk_poly_invntt_tomont 2s 3s -33%
mlk_poly_mulcache_compute 2s 3s -33%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_ntt_c 2s 3s -33%
mlk_poly_reduce 2s 1s +100%
mlk_poly_sub 2s 3s -33%
mlk_poly_tobytes 2s 2s +0%
mlk_poly_tobytes_c 2s 2s +0%
mlk_poly_tomont_native 2s 2s +0%
mlk_poly_tomsg 2s 3s -33%
mlk_polyvec_mulcache_compute 2s 3s -33%
mlk_polyvec_ntt 2s 2s +0%
mlk_polyvec_tobytes 2s 2s +0%
mlk_rej_uniform 2s 1s +100%
mlk_scalar_compress_d10 2s 2s +0%
mlk_scalar_decompress_d10 2s 1s +100%
mlk_sha3_256 2s 2s +0%
mlk_sha3_512 2s 3s -33%
mlk_shake128_squeezeblocks 2s 1s +100%
mlk_shake128x4_squeezeblocks 2s 1s +100%
mlk_shake256x4 2s 3s -33%
mlk_value_barrier_i32 2s 5s -60%
mlk_value_barrier_u32 2s 2s +0%
ntt_native_x86_64 2s 3s -33%
poly_compress_d11_native_x86_64 2s 1s +100%
poly_compress_d4_native_x86_64 2s 1s +100%
poly_compress_d5_native_x86_64 2s 4s -50%
poly_decompress_d11_native_x86_64 2s 4s -50%
poly_tobytes_native_aarch64 2s 3s -33%
poly_tomont_native_x86_64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 2s 2s +0%
rej_uniform_native 2s 5s -60%
rej_uniform_native_aarch64 2s 2s +0%
kem_check_pk 1s 2s -50%
mlk_barrett_reduce 1s 2s -50%
mlk_ct_cmask_neg_i16 1s 3s -67%
mlk_ct_get_optblocker_u32 1s 4s -75%
mlk_keccakf1600_permute 1s 2s -50%
mlk_keccakf1600x4_permute 1s 2s -50%
mlk_keccakf1600x4_xor_bytes 1s 1s +0%
mlk_poly_compress_d10 1s 1s +0%
mlk_poly_decompress_d11 1s 2s -50%
mlk_poly_decompress_d4 1s 2s -50%
mlk_poly_decompress_d5 1s 5s -80%
mlk_poly_decompress_d5_c 1s 1s +0%
mlk_poly_decompress_du 1s 1s +0%
mlk_polyvec_invntt_tomont 1s 2s -50%
mlk_polyvec_reduce 1s 1s +0%
mlk_scalar_compress_d1 1s 3s -67%
mlk_scalar_compress_d11 1s 2s -50%
mlk_scalar_compress_d5 1s 1s +0%
mlk_shake128_absorb_once 1s 2s -50%
mlk_shake256 1s 2s -50%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 1s 1s +0%
sys_check_capability 1s 2s -50%

@willieyz willieyz force-pushed the port-m33-an524 branch 3 times, most recently from aca0c50 to 23ae351 Compare February 25, 2026 15:04
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Feb 25, 2026

CBMC Results (ML-KEM-1024)

Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1135s 1318s -13.9%
mlk_indcpa_enc 134s 159s -16%
mlk_indcpa_keypair_derand 113s 138s -18%
mlk_rej_uniform_c 112s 155s -28%
mlk_polyvec_basemul_acc_montgomery_cached_c 69s 89s -22%
polyvec_basemul_acc_montgomery_cached_native 32s 36s -11%
mlk_poly_rej_uniform 30s 35s -14%
mlk_ntt_layer 28s 39s -28%
mlk_keccak_squeezeblocks_x4 26s 27s -4%
poly_ntt_native 25s 27s -7%
mlk_poly_reduce_native 19s 24s -21%
keccakf1600x4_permute_native_x4 16s 19s -16%
mlk_fqmul 15s 16s -6%
mlk_poly_decompress_d5_native 13s 15s -13%
mlk_poly_decompress_d11_native 12s 15s -20%
mlk_polyvec_add 12s 14s -14%
mlk_keccak_squeeze_once 10s 7s +43%
mlk_poly_frommsg 9s 9s +0%
mlk_indcpa_dec 8s 9s -11%
mlk_ntt_butterfly_block 8s 7s +14%
mlk_poly_frombytes_native 8s 10s -20%
mlk_keccak_absorb_once_x4 7s 6s +17%
mlk_keccak_squeezeblocks 7s 9s -22%
mlk_poly_compress_d11_c 7s 5s +40%
mlk_gen_matrix 6s 5s +20%
mlk_poly_ntt 6s 9s -33%
mlk_poly_tomont_native 6s 2s +200%
mlk_polymat_permute_bitrev_to_custom 6s 7s -14%
rej_uniform_native_x86_64 6s 5s +20%
kem_check_pk 5s 4s +25%
kem_dec 5s 5s +0%
mlk_gen_matrix_serial 5s 5s +0%
mlk_poly_reduce_c 5s 1s +400%
mlk_poly_tobytes_native 5s 4s +25%
mlk_scalar_compress_d1 5s 3s +67%
mlk_shake256x4 5s 4s +25%
poly_frombytes_native_x86_64 5s 5s +0%
intt_native_aarch64 4s 1s +300%
mlk_ct_get_optblocker_i32 4s 2s +100%
mlk_invntt_layer 4s 7s -43%
mlk_keccak_absorb_once 4s 2s +100%
mlk_keccakf1600_permute_c 4s 6s -33%
mlk_keccakf1600_xor_bytes (big endian) 4s 3s +33%
mlk_matvec_mul 4s 3s +33%
mlk_poly_compress_d11_native 4s 2s +100%
mlk_poly_rej_uniform_x4 4s 7s -43%
mlk_poly_tomsg 4s 1s +300%
mlk_polyvec_permute_bitrev_to_custom_native 4s 2s +100%
mlk_polyvec_tobytes 4s 2s +100%
nttunpack_native_x86_64 4s 4s +0%
poly_decompress_d11_native_x86_64 4s 4s +0%
poly_decompress_d5_native_x86_64 4s 5s -20%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 4s 3s +33%
rej_uniform_native 4s 2s +100%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_f1600_x1_native_aarch64_v84a 3s 4s -25%
keccakf1600x4_extract_bytes_native 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 1s +200%
kem_enc 3s 2s +50%
kem_keypair_derand 3s 1s +200%
mlk_barrett_reduce 3s 3s +0%
mlk_ct_cmask_neg_i16 3s 1s +200%
mlk_ct_cmov_zero 3s 3s +0%
mlk_ct_memcmp 3s 2s +50%
mlk_enc_getnoise_eta1_eta2 3s 3s +0%
mlk_keccakf1600_extract_bytes 3s 3s +0%
mlk_keccakf1600_permute 3s 3s +0%
mlk_keccakf1600x4_extract_bytes 3s 3s +0%
mlk_keccakf1600x4_extract_bytes_c 3s 4s -25%
mlk_keccakf1600x4_xor_bytes 3s 1s +200%
mlk_keccakf1600x4_xor_bytes_c 3s 2s +50%
mlk_poly_cbd_eta2 3s 2s +50%
mlk_poly_compress_d10_native 3s 5s -40%
mlk_poly_compress_d11 3s 4s -25%
mlk_poly_compress_d4_native 3s 2s +50%
mlk_poly_decompress_d10_native 3s 1s +200%
mlk_poly_decompress_d4 3s 3s +0%
mlk_poly_getnoise_eta1_4x 3s 3s +0%
mlk_poly_getnoise_eta2 3s 1s +200%
mlk_poly_invntt_tomont 3s 4s -25%
mlk_poly_mulcache_compute_c 3s 1s +200%
mlk_polyvec_basemul_acc_montgomery_cached 3s 3s +0%
mlk_polyvec_compress_du 3s 2s +50%
mlk_polyvec_reduce 3s 3s +0%
mlk_polyvec_tomont 3s 3s +0%
mlk_scalar_compress_d10 3s 1s +200%
mlk_scalar_decompress_d10 3s 1s +200%
mlk_value_barrier_u32 3s 2s +50%
mlk_value_barrier_u8 3s 2s +50%
poly_compress_d4_native_x86_64 3s 3s +0%
poly_mulcache_compute_native_aarch64 3s 3s +0%
poly_mulcache_compute_native_x86_64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 3s +0%
rej_uniform_native_aarch64 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccakf1600_permute_native 2s 2s +0%
kem_check_sk 2s 3s -33%
kem_keypair 2s 2s +0%
mlk_ct_cmask_nonzero_u8 2s 3s -33%
mlk_ct_get_optblocker_u32 2s 1s +100%
mlk_ct_sel_uint8 2s 2s +0%
mlk_keccakf1600_xor_bytes 2s 2s +0%
mlk_keccakf1600x4_permute 2s 3s -33%
mlk_keypair_getnoise_eta1 2s 4s -50%
mlk_montgomery_reduce 2s 3s -33%
mlk_poly_cbd_eta1 2s 3s -33%
mlk_poly_compress_d10 2s 3s -33%
mlk_poly_compress_d10_c 2s 3s -33%
mlk_poly_compress_d4 2s 2s +0%
mlk_poly_compress_d4_c 2s 1s +100%
mlk_poly_compress_d5_c 2s 3s -33%
mlk_poly_compress_dv 2s 4s -50%
mlk_poly_decompress_d11 2s 3s -33%
mlk_poly_decompress_d11_c 2s 2s +0%
mlk_poly_decompress_d4_c 2s 1s +100%
mlk_poly_decompress_d5 2s 1s +100%
mlk_poly_decompress_d5_c 2s 2s +0%
mlk_poly_decompress_du 2s 3s -33%
mlk_poly_decompress_dv 2s 2s +0%
mlk_poly_frombytes_c 2s 2s +0%
mlk_poly_getnoise_eta1122_4x 2s 2s +0%
mlk_poly_getnoise_eta1_4x_native 2s 3s -33%
mlk_poly_mulcache_compute 2s 2s +0%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_tobytes_c 2s 4s -50%
mlk_poly_tomont_c 2s 2s +0%
mlk_polyvec_decompress_du 2s 3s -33%
mlk_polyvec_frombytes 2s 4s -50%
mlk_polyvec_invntt_tomont 2s 2s +0%
mlk_polyvec_mulcache_compute 2s 1s +100%
mlk_polyvec_ntt 2s 3s -33%
mlk_polyvec_permute_bitrev_to_custom 2s 2s +0%
mlk_scalar_decompress_d11 2s 1s +100%
mlk_scalar_decompress_d5 2s 2s +0%
mlk_scalar_signed_to_unsigned_q 2s 3s -33%
mlk_sha3_256 2s 2s +0%
mlk_sha3_512 2s 1s +100%
mlk_shake128x4_squeezeblocks 2s 2s +0%
mlk_shake256 2s 1s +100%
ntt_native_aarch64 2s 3s -33%
poly_compress_d10_native_x86_64 2s 2s +0%
poly_compress_d11_native_x86_64 2s 1s +100%
poly_compress_d5_native_x86_64 2s 3s -33%
poly_decompress_d10_native_x86_64 2s 3s -33%
poly_reduce_native_x86_64 2s 3s -33%
poly_tomont_native_x86_64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 3s -67%
keccak_f1600_x4_native_avx2 1s 1s +0%
kem_enc_derand 1s 3s -67%
mlk_check_pct 1s 1s +0%
mlk_ct_cmask_nonzero_u16 1s 1s +0%
mlk_ct_get_optblocker_u8 1s 3s -67%
mlk_ct_sel_int16 1s 2s -50%
mlk_keccakf1600_extract_bytes (big endian) 1s 3s -67%
mlk_poly_add 1s 2s -50%
mlk_poly_compress_d5 1s 1s +0%
mlk_poly_compress_d5_native 1s 1s +0%
mlk_poly_compress_du 1s 2s -50%
mlk_poly_decompress_d10 1s 2s -50%
mlk_poly_decompress_d10_c 1s 4s -75%
mlk_poly_decompress_d4_native 1s 2s -50%
mlk_poly_frombytes 1s 1s +0%
mlk_poly_invntt_tomont_c 1s 4s -75%
mlk_poly_ntt_c 1s 6s -83%
mlk_poly_reduce 1s 4s -75%
mlk_poly_sub 1s 2s -50%
mlk_poly_tobytes 1s 2s -50%
mlk_poly_tomont 1s 2s -50%
mlk_rej_uniform 1s 2s -50%
mlk_scalar_compress_d11 1s 2s -50%
mlk_scalar_compress_d4 1s 3s -67%
mlk_scalar_compress_d5 1s 2s -50%
mlk_scalar_decompress_d4 1s 3s -67%
mlk_shake128_absorb_once 1s 3s -67%
mlk_shake128_squeezeblocks 1s 2s -50%
mlk_shake128x4_absorb_once 1s 2s -50%
mlk_value_barrier_i32 1s 2s -50%
ntt_native_x86_64 1s 3s -67%
poly_decompress_d4_native_x86_64 1s 1s +0%
poly_getnoise_eta1122_4x_native 1s 2s -50%
poly_invntt_tomont_native 1s 2s -50%
poly_reduce_native_aarch64 1s 3s -67%
poly_tobytes_native_aarch64 1s 2s -50%
poly_tobytes_native_x86_64 1s 2s -50%
poly_tomont_native_aarch64 1s 2s -50%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 1s 6s -83%
sys_check_capability 1s 2s -50%

@willieyz willieyz force-pushed the port-m33-an524 branch 8 times, most recently from 9510d04 to 6a14b70 Compare March 3, 2026 09:06
@willieyz willieyz force-pushed the port-m33-an524 branch 2 times, most recently from bcf3828 to 1a2628f Compare March 19, 2026 07:22
@mkannwischer mkannwischer force-pushed the port-m33-an524 branch 3 times, most recently from dea3f57 to 21d79d3 Compare April 24, 2026 08:20
willieyz and others added 2 commits April 24, 2026 16:20
Add bare-metal platform support for Cortex-M33 on MPS3-AN524,
tested via qemu (qemu-system-arm -M mps3-an524).
Notice that the configuration *_CONFIG_REDUCE_RAM did not implement in
mlkem-native, we skip this option during this porting.

- Add platform makefile and qemu exec wrapper for M33-AN524
- Platform files are provided by pqmx, see slothy-optimizer/pqmx#116
- Add Cortex-M33 DWT cycle counter support in HAL
  (distinct from Cortex-M55 PMU-based counting)
- Generalize Nix package from m55-an547 to pqmx to serve
  both M55-AN547 and M33-AN524 platforms
- Allow MLD_BUMP_ALLOC_SIZE to be overridden at compile time
  (we only have a  96 KiB stack)
- Add M33 baremetal test to CI matrix

Signed-off-by: willieyz <willie.zhao@chelpis.com>
Without +nodsp, GCC's -mcpu=cortex-m33 enables DSP by default and
auto-vectorizes poly_ntt into smlabb, which faults under QEMU's
mps3-an524 cortex-m33.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer marked this pull request as ready for review April 25, 2026 02:49
@mkannwischer mkannwischer requested a review from a team as a code owner April 25, 2026 02:49
Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willieyz. I found the issue - see the last commit.

@mkannwischer mkannwischer merged commit 1f1cf48 into main Apr 25, 2026
789 of 815 checks passed
@mkannwischer mkannwischer deleted the port-m33-an524 branch April 25, 2026 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port: Baremetal: Add Cortex-M33 MPS3-AN524 platform support

4 participants