Revert "Use LLVM intrinsics for madd intrinsics"#2014
Revert "Use LLVM intrinsics for madd intrinsics"#2014sayantn merged 2 commits intorust-lang:mainfrom
madd intrinsics"#2014Conversation
|
See also https://godbolt.org/z/8z53f6WPE r? sayantn |
This reverts commit 3214671.
3ae174a to
ed3475f
Compare
crates/core_arch/src/x86/avx2.rs
Outdated
| /// This is a trick used in the adler32 algorithm to get a widening addition. The | ||
| /// multiplication by 1 is trivial, but must not be optimized out because then the vpmaddwd | ||
| /// instruction is no longer selected. The assert_instr verifies that this is the case. | ||
| #[target_feature(enable = "avx2")] | ||
| #[cfg_attr(test, assert_instr(vpmaddwd))] | ||
| unsafe fn test_mm256_madd_epi16_mul_one(mad: __m256i) -> __m256i { | ||
| let one_v = _mm256_set1_epi16(1); | ||
| _mm256_madd_epi16(mad, one_v) | ||
| } |
There was a problem hiding this comment.
Please just append this to the test_mm256_madd_epi16 function. Or if you really want to keep it as a separate function, just move it to the tests module
There was a problem hiding this comment.
The test checks that the instruction is emitted, so it can't just be added to that existing test. Locally it didn't run when I up it with the other tests, but on CI it does, so I've moved it now.
ed3475f to
8a883ad
Compare
|
This seems like it would break the usage of base64. https://godbolt.org/z/q6e1ne568 12 I have some concerns about this kind of change. It might introduce reliance on hard-to-predict optimizer behavior and lead to performance issues that are difficult to detect. Footnotes |
|
Interesting, I think it uses a shift instead of a multiply and that fools the pattern match. Yeah that needs a fix. Do you have any other cases that don't optimize? I'll revert this, and then add separate tests for the adler32 and base64 patterns. |
Can't find more on GitHub. https://github.com/search?q=_mm_madd_epi16+NOT+is%3Afork+language%3ARust&type=code If not limited to real-world cases, there are also cases like |
This reverts commit 3214671.
The commit is from #1985, which itself reverted this original change.
Use
intrinsics::simdformadd(again), now that llvm/llvm-project#174149 should make this optimize properly.