Skip to content

Option 14: unholy alloy of gold and silver #4

Description

@corsix

With reference to https://www.corsix.org/content/fast-crc32c-4k, what I call crc32_4k is your option 12 ("8-byte Hardware-accelerated"), and what I call crc32_4k_three_way is your option 13 ("Golden"). The theoretical upper bound on option 13 is 64 bits/cycle, which your implementation gets close to, at 62 bits/cycle. What I realised is that:

  1. There's an inferior option, that I call crc32_4k_pclmulqdq, but you might call "Silver".
  2. Gold and silver use separate execution ports, and thus can be alloyed together, for a theoretical upper bound of 120.89 bits/cycle (this is 64+72 bytes every 9 cycles). I'm measuring 93 bits/cycle for this alloy, and I imagine that a well tuned implementation could get closer to 120.89.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions