Redesign intrinsic-test to use simple comparison#2063
Conversation
feb1dcd to
6ef8b8f
Compare
This seems weird ( unsafe extern "C" {
fn vdup_n_f16_wrapper(value: f16) -> float16x4_t;
}In fact most Edit: To work around this issue I have modified the tool to communicate with C via pointers (e.g. the C wrapper for @Amanieu is this intended behavior or a bug? |
e2346ff to
db1b2ca
Compare
|
Btw the time gains are significant, it reduces the Arm and aarch64 times to 2-3 minutes, and the full x86 run (we did 20% previously) to around 12 mins for release and 17 mins for dev |
|
Great work. Quick sanity check on |
|
@folkertdev ooh, that makes sense. I don't particularly care about windows, but we are using LLVM20 in the CI. I can change it to use the build from kernel.org |
|
I'm seeing |
|
yeah, but I can use the LLVM github builds or the kernel.org builds |
ce53e81 to
76dd339
Compare
|
Can f16 tests just be gated with |
|
@tgross35 the f16 tests are mostly fine now. More concerning is that a lot of tests are failing in all 3 arm archs, e.g. edit: sorry, my mistake, they are still failing in ARMv7. I will gate them against the flag |
|
With LLVM 22 |
|
FTZ/DAZ-related perhaps? |
I don't really think so, the outputs seem completely distinct. I noticed that use core::arch::aarch64::*;
#[unsafe(no_mangle)]
#[target_feature(enable = "neon")]
pub unsafe extern "C" fn foo(dst: *mut uint8x16x2_t, a: *const uint8x16_t, b: *const uint8x16_t) {
unsafe {
*dst = vzipq_u8(*a, *b);
}
}produces foo:
ld1 { v0.16b }, [x1]
ld1 { v1.16b }, [x2]
add x8, x0, #16
zip1 v2.16b, v0.16b, v1.16b
zip2 v0.16b, v0.16b, v1.16b
st1 { v2.16b }, [x0]
st1 { v0.16b }, [x8]
retBut the C code seemingly has different behavior on GCC and clang https://godbolt.org/z/T3YnrejjG @adamgemmell can you help in this? |
|
I'm not sure it will fix your issue but the difference in instructions comes from the fact that in arm_neon.h, they reverse every vector before and after the operation on big endian. It's not always actually necessary so we only do it if it's broken without it - however, the intrinsic test tool doesn't detect the difference in behaviour because both arguments it picks are identical. e.g.: |
|
You can try adding Also I don't actually see vzipq_u8 on the latest CI run, why is that? |
I have no idea, I can confirm that locally the test is generated and run.
I will check. Thanks Edit: @adamgemmell adding Edit2: sorry, |
|
None of the unsigned variants of vzipq seem to be seen there, weird. I'd quite like to know why this patch detects the difference - when I looked locally the codegen of the tests seemed very similar |
|
Yeah I fixed the test not being included, I used |
This comment has been minimized.
This comment has been minimized.
a057d30 to
2dfa840
Compare
This comment has been minimized.
This comment has been minimized.
0316c79 to
76c3d5a
Compare
This comment has been minimized.
This comment has been minimized.
bf75e04 to
20ac12e
Compare
This comment has been minimized.
This comment has been minimized.
19cfd95 to
421ca05
Compare
|
r? @Amanieu rustbot has assigned @Amanieu. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
@Amanieu @folkertdev sorry for the inconvenience, I will split up just one more PR (last one I promise). Figured out how to split the @rustbot author |
|
Error: The feature Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #triagebot on Zulip. |
Currently
intrinsic-testprints the outputs and then compares the outputs manually. This PR uses a different approach -- generate C wrappers for the intrinsics, link to them from Rust, and then just use simple rust tests to compare outputsIt is much easier to review commit-by-commit