Skip to content

Add intrinsic for launch-sized workgroup memory on GPUs#146181

Open
Flakebi wants to merge 1 commit intorust-lang:mainfrom
Flakebi:dynamic-shared-memory
Open

Add intrinsic for launch-sized workgroup memory on GPUs#146181
Flakebi wants to merge 1 commit intorust-lang:mainfrom
Flakebi:dynamic-shared-memory

Conversation

@Flakebi
Copy link
Contributor

@Flakebi Flakebi commented Sep 3, 2025

Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

Interface

With this change, workgroup memory can be accessed in Rust by
calling the new gpu_launch_sized_workgroup_mem<T>() -> *mut T
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of T.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of T but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.

Tracking issue: #135516

@rustbot
Copy link
Collaborator

rustbot commented Sep 3, 2025

r? @petrochenkov

rustbot has assigned @petrochenkov.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-compiletest Area: The compiletest test runner A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 3, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 3, 2025

Some changes occurred in src/tools/compiletest

cc @jieyouxu

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

@rust-log-analyzer

This comment has been minimized.

@Flakebi Flakebi force-pushed the dynamic-shared-memory branch from 0aa0e58 to 3ebaccb Compare September 3, 2025 22:43
@rust-log-analyzer

This comment has been minimized.

@Flakebi Flakebi force-pushed the dynamic-shared-memory branch from 3ebaccb to 2378959 Compare September 3, 2025 22:50
#[rustc_nounwind]
#[unstable(feature = "dynamic_shared_memory", issue = "135513")]
#[cfg(any(target_arch = "amdgpu", target_arch = "nvptx64"))]
pub fn dynamic_shared_memory<T: ?Sized>() -> *mut T;
Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that outside the GPU world, "shared memory" typically refers to memory shared between processes. So I would suggest using a name that's less likely to be confused, like something that explicitly involves "GPU" or so.

This sounds like a form of "global" memory (similar to a static item), but then apparently OpenCL calls it "local" which is very confusing...

Copy link
Contributor Author

@Flakebi Flakebi Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a mod gpu?
I think there are more intrinsics for gpus that make can be added (although more in the traditional intrinsic sense, relating to an instruction, edit: re-exposing intrinsics from core::arch::nvptx and the amdgpu equivalent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should it be in core::arch::gpu?
(From #135516 (comment), cc @workingjubilee)

Copy link
Member

@RalfJung RalfJung Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust intrinsic names are not namespaced. They are exposed in a module, but inside the compiler they are identified entirely by their name. So moving them into a different module doesn't alleviate the need for a clear name that will be understandable to non-GPU people working in the compiler (which is the vast majority of compiler devs).

If there's more GPU intrinsics to come, moving them into a gpu.rs file here still might make sense.

I don't have a strong opinion on how the eventually stable public API is organized, I am commenting entirely as someone who has an interest in keeping the set of intrinsics the Rust compiler offers understandable and well-defined (the ones in this folder, not the ones in core::arch which you call "more traditional" but that's very dependent on your background ;). These intrinsics are just an implementation detail, but every intrinsic we add here is a new language primitive -- it's like adding a new keyword, just without the syntax discussions and perma-unstable. In the past we used to have intrinsics that entirely break the internal consistency of the language, and we used to have intrinsics whose safety requirements were very poorly documented.

@RalfJung
Copy link
Member

RalfJung commented Sep 4, 2025

Sorry for drowning you in questions here, but extending the core language with new operations (as in, adding a new intrinsic doing things that couldn't be done before) is a big deal, and we had a bad experience in the past when this was done without wider discussion in the team to ensure that the intrinsics actually make sense in the context of Rust. Not everything that exists in the hardware can be 1:1 exposed in Rust, sometimes this requires a lot of work and sometimes it's just basically impossible. It can be a lot of work to clean these things up later, and as someone who did a bunch of that work, I'd rather not have to do it again. :)

@Flakebi
Copy link
Contributor Author

Flakebi commented Sep 4, 2025

I agree that it makes a lot of sense to have the discussion now. Thanks for taking a look and helping to design something useful!

Speaking of safety requirements... how does one use this pointer?

Heh, yes, that’s something that should be mentioned in the doc comment as well. (Especially comments on how to safely use it.)

I get that it is aligned, but does it point to enough memory to store a T?

Depends on the size specified on the CPU side when launching the gpu-kernel. It may or it may not.

If it's always the same address, doesn't everyone overwrite each other's data all the time? This API looks very odd for a non-GPU person, and it's not clear to me whether that is resolved by having more magic behavior (which should be documented or at least referenced here), or whether there's higher-level APIs built on top that deal with this (but this intrinsic provides so few guarantees, I can't see how that should be possible).

There are “higher-level APIs” like “do a fast matrix-matrix multiplication”, but not much in-between. I’d assume that people usually use this in its raw form.
On GPUs, accessing memory is orders of magnitude slower than it is on CPUs. But, GPUs

  1. have a lot more registers (e.g. up to 256 32-bit registers on amdgpu)
  2. and shared memory, which is essentially a software-defined cache.

Two general use cases are: 1) All threads in a group load a part from global memory (the RAM/VRAM) and store it in shared memory. Then all threads read from the collaboratively loaded data. 2) All threads in a group do some work and collaborate on shared memory (with atomics or so) to aggregate results. Then one of the threads stores the final result to global memory.

So, shared memory is meant to be accessed collaboratively and the developer must ensure proper synchronization. It is hard to provide a safe abstraction for this and tbh, I don’t want to try 😅 (though I can see 3rd party crates doing this – at least to some extent).

From Rust’s perspective, guarantees should be the same as with memory that’s shared between processes.

Typically, intrinsic documentations should be detailed enough that I can read and write code using the intrinsic and know exactly whether the code is correct and what it will do in all circumstances. I don't know if there's any hope of achieving that with GPU intrinsics, but if not then we need to have a bit of a wider discussion -- we have had bad experience with just importing "externally defined" semantics into Rust without considering all the interactions (in general, it is not logically coherent to have semantics externally defined).

I agree, it would be nice to have good documentation for the intrinsics in Rust!

@RalfJung
Copy link
Member

RalfJung commented Sep 4, 2025

Depends on the size specified on the CPU side when launching the gpu-kernel. It may or it may not.

Wait, there's a single static size set when launching the kernel? Why is it called "dynamic" memory? "dynamic" memory usually means malloc/free, i.e. you can get any amount of fresh memory during runtime (until RAM is full obviously).

Are you saying dynamic shared memory is neither dynamic in the normal sense nor shared in the normal sense? ;)

@petrochenkov
Copy link
Contributor

r? @RalfJung

@rustbot rustbot assigned RalfJung and unassigned petrochenkov Sep 4, 2025
@RalfJung
Copy link
Member

RalfJung commented Sep 4, 2025

I won't be able to do the final approval here, I can just help with ensuring that the intrinsics are documented well enough that they can be understood without GPU expertise, and that the LLVM codegen looks vaguely reasonable.

I don't know if we have anyone who actually knows how the generated LLVM IR should look like and can ensure it makes sense. r? @nikic maybe?

@Flakebi
Copy link
Contributor Author

Flakebi commented Dec 19, 2025

I pushed the rename to gpu_launch_sized_workgroup_mem and removed all “shared memory” wording, it’s just “workgroup memory” now.

Sorry for breaking nvptx names again. I tried to add a test for this, but this only fails in ptxas, so I didn’t manage to.
@kjetilkjeka, your LLVM change looks good to me (feel free to ping me if you don’t get a review).
This PR now gives a name on nvptx again, explicitly as a workaround, so it can be easily removed once the LLVM fix is propagated to Rust.

@kjetilkjeka
Copy link
Contributor

I made the llvm PR: llvm/llvm-project#173018

I also tested your branch with the fix and that works great! The ptx is arguably nicer as the name of the .extern .shared is readable. I even attempted to make the name collide with a function with the same name and it was automatically changed to gpu_launch_sized_workgroup_mem1. I'm very happy with this workaround and hope that @RalfJung concludes that we can live with it.

I also think the name in the current revision is a nice improvement, but in case the reviews don't like it, the other alternatives are fine as well.

All in all this PR looks pretty great from my point of view. I wouldn't expect @RDambrosio016 to take a look so there's at least no reason to block it for nvptx related things.

@Flakebi
Copy link
Contributor Author

Flakebi commented Dec 19, 2025

Thanks for testing and for opening the LLVM PR. (You probably need a lit test for people to approve it.)

I also think the name in the current revision is a nice improvement

Just for context on why I removed the name and why I think it should eventually be removed for nvptx as well: With the named global, the maximum alignment is enforced for all kernels in the IR module, which is unnecessarily conservative. With the unnamed global, different kernels in the same module can end up with different minimum alignments, depending on the calls they make. IMO the later behavior is what we want.

@bors
Copy link
Collaborator

bors commented Dec 28, 2025

☔ The latest upstream changes (presumably #150448) made this pull request unmergeable. Please resolve the merge conflicts.

@Flakebi Flakebi force-pushed the dynamic-shared-memory branch from c9c04f0 to d45fd86 Compare December 28, 2025 03:22
@rustbot

This comment has been minimized.

@Flakebi
Copy link
Contributor Author

Flakebi commented Dec 28, 2025

Rebased to fix conflicts, no other changes

@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 2, 2026

This should be good to merge.

@RalfJung, are you ok with giving the final sign-off based on these approvals?

@RalfJung
Copy link
Member

RalfJung commented Jan 3, 2026

Given how disconnected I am from the implementation work, I am not comfortable doing the final review here. @nikic is assigned, @workingjubilee might also be willing to take over based on their statements in other PRs.

@rust-bors

This comment has been minimized.

@Flakebi Flakebi force-pushed the dynamic-shared-memory branch from d45fd86 to cc300ab Compare January 16, 2026 09:40
@rustbot

This comment has been minimized.

@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 16, 2026

Rebased to fix conflicts, no other changes

@rust-bors

This comment has been minimized.

Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
@Flakebi Flakebi force-pushed the dynamic-shared-memory branch from cc300ab to 3541dd4 Compare January 21, 2026 19:03
@rustbot
Copy link
Collaborator

rustbot commented Jan 21, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 21, 2026

Rebased to fix conflicts, no other changes.
If there are no more concerns, it would be great to get this merged some time :)

Maybe r? workingjubilee ?

@rustbot rustbot assigned workingjubilee and unassigned nikic Jan 21, 2026
@rustbot
Copy link
Collaborator

rustbot commented Jan 21, 2026

workingjubilee is currently at their maximum review capacity.
They may take a while to respond.

@workingjubilee
Copy link
Member

I am indeed happy to take this review as long as you are happy to be patient. ^^;

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Per-launch-sized workgroup memory". What a mouthful.

...Yeah. The intrinsic is indeed named correctly but... what a name. But! I am very much not going to suggest alternatives. If all the maintainers are happy enough with this name, it is good by me.

I see that we have further discussed the possibility of the alternative interface I suggested, using the close variant of extern { static }, and rejected it. The reasons for rejecting it make sense to me, and I am happy to see that we inspected it a bit further since I feel like the best case for some variation on the alternative I was thinking of was indeed made aptly enough by @Flakebi. Maybe the language should get some way to express this concept more aptly, but not today.

This looks good overall, but there's some stuff you should do to make sure this passes CI, and I would like to see a small amount of bonus annotation for maintainability.

View changes since this review

#[rustc_nounwind]
#[unstable(feature = "gpu_launch_sized_workgroup_mem", issue = "135513")]
#[cfg(any(target_arch = "amdgpu", target_arch = "nvptx64"))]
pub fn gpu_launch_sized_workgroup_mem<T>() -> *mut T;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did indeed decide to have a mod gpu, can this be moved there?

Comment on lines +1 to +10
// Checks that the GPU intrinsic to get launch-sized workgroup memory works.

//@ revisions: amdgpu nvptx
//@ compile-flags: --crate-type=rlib
//
//@ [amdgpu] compile-flags: --target amdgcn-amd-amdhsa -Ctarget-cpu=gfx900
//@ [amdgpu] needs-llvm-components: amdgpu
//@ [nvptx] compile-flags: --target nvptx64-nvidia-cuda
//@ [nvptx] needs-llvm-components: nvptx
//@ add-minicore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a codegen-llvm test, you must specify an opt level, or add revisions for multiple opt levels, if you want this to pass CI.

@@ -0,0 +1,31 @@
// Checks that the GPU intrinsic to get launch-sized workgroup memory works.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Checks that the GPU intrinsic to get launch-sized workgroup memory works.
// Checks that the GPU intrinsic to get launch-sized workgroup memory works
// and correctly aligns the `external addrspace(...) global`s over multiple calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I kind of wanted there to be another test that does the cross-crate version of this. Then I remembered what was discussed elsewhere: that the targets in question are pure LLVM bitcode that gets mashed together anyways, so I am not sure it would actually benefit us, and it would probably involve a ton of tedium with run-make, having considered it in more detail. So, meh.

Basically only leaving this note here to remind myself that if this turns out to go awry in the future, I can update in the direction of following this kind of instinct more often. :^)

size_t NameLen,
LLVMTypeRef Ty) {
Module *Mod = unwrap(M);
unsigned AddressSpace = Mod->getDataLayout().getDefaultGlobalsAddressSpace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep forgetting this "helpful" C "abbreviation", so let's expand it:

Suggested change
unsigned AddressSpace = Mod->getDataLayout().getDefaultGlobalsAddressSpace();
unsigned int AddressSpace = Mod->getDataLayout().getDefaultGlobalsAddressSpace();

extern "C" LLVMValueRef
LLVMRustGetOrInsertGlobalInAddrspace(LLVMModuleRef M, const char *Name,
size_t NameLen, LLVMTypeRef Ty,
unsigned AddressSpace) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned AddressSpace) {
unsigned int AddressSpace) {

Comment on lines +301 to +302
extern "C" LLVMValueRef
LLVMRustGetOrInsertGlobalInAddrspace(LLVMModuleRef M, const char *Name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function will always create a specifically-extern global, can we document that here? The name is whatever, to me.

Comment on lines +557 to +562
// The name of the global variable is not relevant, the important properties are.
// 1. The global is in the address space for workgroup memory
// 2. It is an extern global
// All instances of extern addrspace(gpu_workgroup) globals are merged in the LLVM backend.
// Generate an unnamed global per intrinsic call, so that different kernels can have
// different minimum alignments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lead with what is important, instead of leading with "don't look behind this curtain". Also, the alignment is an important property.

Suggested change
// The name of the global variable is not relevant, the important properties are.
// 1. The global is in the address space for workgroup memory
// 2. It is an extern global
// All instances of extern addrspace(gpu_workgroup) globals are merged in the LLVM backend.
// Generate an unnamed global per intrinsic call, so that different kernels can have
// different minimum alignments.
// Generate an anonymous global per call, with these properties:
// 1. The global is in the address space for workgroup memory
// 2. It is an `external` global
// 3. It is correctly aligned for the pointee `T`
// All instances of extern addrspace(gpu_workgroup) globals are merged in the LLVM backend.
// The name is irrelevant.

/// # Safety
///
/// The pointer is safe to dereference from the start (the returned pointer) up to the
/// size of workgroup memory that was specified when launching the current gpu-kernel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small add maybe: It seems beneficial to explicitly deny this relationship? At your discretion.

Suggested change
/// size of workgroup memory that was specified when launching the current gpu-kernel.
/// size of workgroup memory that was specified when launching the current gpu-kernel.
/// This allocated size is not related in any way to `T`.

Otherwise this documentation is fine now. Thank you!

@workingjubilee
Copy link
Member

@bors try jobs=x86_64-gnu-nopt,x86_64-gnu-debug

rust-bors bot pushed a commit that referenced this pull request Feb 5, 2026
Add intrinsic for launch-sized workgroup memory on GPUs


try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-debug
@rust-bors

This comment has been minimized.

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-nopt failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
REPOSITORY                                   TAG       IMAGE ID       CREATED       SIZE
ghcr.io/dependabot/dependabot-updater-core   latest    bcec0b4e062b   10 days ago   783MB
=> Removing docker images...
Deleted Images:
untagged: ghcr.io/dependabot/dependabot-updater-core:latest
untagged: ghcr.io/dependabot/dependabot-updater-core@sha256:b662be51f7b8ef7e2c8464428f14e49cb79c36aa9afb7ecb9221dfe0f507050c
deleted: sha256:bcec0b4e062b5ffe11cc1c2729558c0cd96621c0271ab5e97ff3a56e0c25045a
deleted: sha256:64e147d5e54d9be8b8aa322e511cda02296eda4b8b8d063c6a314833aca50e29
deleted: sha256:5cba409bb463f4e7fa1a19f695450170422582c1bc7c0e934d893b4e5f558bc6
deleted: sha256:cddc6ebd344b0111eaab170ead1dfda24acdfe865ed8a12599a34d338fa8e28b
deleted: sha256:2412c3f334d79134573cd45e657fb6cc0abd75bef3881458b0d498d936545c8d
---
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap (line 72) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,RandomState>::with_capacity (line 283) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::capacity (line 445) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,RandomState>::from (line 1504) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::clear (line 819) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::contains_key (line 1220) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::drain (line 721) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::entry (line 959) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::get (line 987) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::extract_if (line 761) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::get_disjoint_mut (line 1119) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::get_disjoint_mut (line 1077) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::hasher (line 837) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::get_disjoint_unchecked_mut (line 1162) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::get_key_value (line 1017) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::get_mut (line 1247) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::insert (line 1280) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::into_iter (line 2053) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::into_keys (line 491) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::into_values (line 586) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::is_empty (line 698) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::iter (line 618) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::iter_mut (line 648) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::len (line 681) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::keys (line 461) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::remove (line 1339) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::remove_entry (line 1367) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::reserve (line 870) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::retain (line 792) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::shrink_to (line 937) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::shrink_to_fit (line 913) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::try_insert (line 1309) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::try_reserve (line 895) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::values (line 523) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S>::with_capacity_and_hasher (line 383) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S,A>::values_mut (line 552) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::HashMap<K,V,S>::with_hasher (line 351) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::IntoIter (line 1611) ... ok
test library/std/src/collections/hash/map.rs - collections::hash::map::IntoKeys (line 1843) ... ok
---
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,RandomState>::new (line 142) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,RandomState>::with_capacity (line 161) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::capacity (line 313) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,RandomState>::from (line 1180) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::clear (line 490) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::contains (line 758) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::drain (line 398) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::difference (line 630) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::entry (line 861) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::extract_if (line 434) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::get (line 783) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::get_or_insert (line 805) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::get_or_insert_with (line 829) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::hasher (line 508) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::insert (line 983) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::into_iter (line 1643) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::intersection (line 697) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::is_disjoint (line 903) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::is_empty (line 375) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::iter (line 329) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::is_subset (line 929) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::is_superset (line 951) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::len (line 357) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::reserve (line 541) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::remove (line 1030) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::replace (line 1004) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::retain (line 465) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::shrink_to (line 607) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::shrink_to_fit (line 584) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::take (line 1058) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::symmetric_difference (line 660) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::try_reserve (line 567) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S,A>::union (line 726) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S>::with_capacity_and_hasher (line 255) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::HashSet<T,S>::with_hasher (line 223) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::Intersection (line 1507) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::Iter (line 1392) ... ok
test library/std/src/collections/hash/set.rs - collections::hash::set::IntoIter (line 1422) ... ok
---
test [ui] tests/ui/asm/aarch64/type-f16.rs ... ignored, only executed when the architecture is aarch64
test [ui] tests/ui/array-slice-vec/vector-no-ann-2.rs ... ok
test [ui] tests/ui/array-slice-vec/vector-slice-matching-8498.rs ... ok
test [ui] tests/ui/asm/aarch64/arm64ec-sve.rs ... ok
test [ui] tests/ui/asm/aarch64v8r.rs#hf ... ok
test [ui] tests/ui/asm/aarch64v8r.rs#r82 ... ok
test [ui] tests/ui/asm/aarch64v8r.rs#sf ... ok
test [ui] tests/ui/asm/arm-low-dreg.rs ... ok
test [ui] tests/ui/asm/asm-with-nested-closure.rs ... ok
test [ui] tests/ui/asm/binary_asm_labels_allowed.rs ... ignored, only executed when the architecture is aarch64
test [ui] tests/ui/asm/binary_asm_labels.rs ... ok
test [ui] tests/ui/asm/bad-template.rs#aarch64 ... ok
---
test [ui] tests/ui/extern/issue-64655-extern-rust-must-allow-unwind.rs#fat2 ... ok
test [ui] tests/ui/extern/issue-64655-extern-rust-must-allow-unwind.rs#fat3 ... ok
test [ui] tests/ui/extern/issue-80074.rs ... ok
test [ui] tests/ui/extern/issue-95829.rs ... ok
test [ui] tests/ui/extern/lgamma-linkage.rs ... ok
test [ui] tests/ui/extern/no-mangle-associated-fn.rs ... ok
test [ui] tests/ui/extern/not-in-block.rs ... ok
test [ui] tests/ui/extern/unsized-extern-derefmove.rs ... ok
test [ui] tests/ui/extern/windows-tcb-trash-13259.rs ... ok
test [ui] tests/ui/feature-gates/allow-features-empty.rs ... ok
---
test [ui] tests/ui/imports/ambiguous-9.rs ... ok
test [ui] tests/ui/imports/ambiguous-import-visibility-module.rs ... ok
test [ui] tests/ui/imports/ambiguous-glob-vs-expanded-extern.rs ... ok
test [ui] tests/ui/imports/ambiguous-8.rs ... ok
test [ui] tests/ui/imports/ambiguous-panic-glob-vs-multiouter.rs ... ok
test [ui] tests/ui/imports/ambiguous-panic-globvsglob.rs ... ok
test [ui] tests/ui/imports/ambiguous-panic-no-implicit-prelude.rs ... ok
test [ui] tests/ui/imports/ambiguous-panic-non-prelude-core-glob.rs ... ok
test [ui] tests/ui/imports/ambiguous-panic-non-prelude-std-glob.rs ... ok
test [ui] tests/ui/imports/ambiguous-panic-pick-core.rs ... ok
test [ui] tests/ui/imports/ambiguous-import-visibility-macro.rs ... ok
---
test [codegen] tests/codegen-llvm/zst-offset.rs ... ok

failures:

---- [codegen] tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs#amdgpu stdout ----
------FileCheck stdout------------------------------

------FileCheck stderr------------------------------
/checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs:22:12: error: amdgpu: expected string not found in input
// amdgpu: ret { ptr, ptr } { ptr addrspacecast (ptr addrspace(3) @[[SMALL]] to ptr), ptr addrspacecast (ptr addrspace(3) @[[BIG]] to ptr) }
           ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll:7:52: note: scanning from here
@1 = external addrspace(3) global [0 x i8], align 8
                                                   ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll:7:52: note: with "SMALL" equal to "0"
@1 = external addrspace(3) global [0 x i8], align 8
                                                   ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll:7:52: note: with "BIG" equal to "1"
@1 = external addrspace(3) global [0 x i8], align 8
                                                   ^

Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll
Check file: /checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = 'gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0' 
            2: source_filename = "gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0" 
            3: target datalayout = "e-m:e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9" 
            4: target triple = "amdgcn-amd-amdhsa" 
            5:  
            6: @0 = external addrspace(3) global [0 x i8], align 4 
            7: @1 = external addrspace(3) global [0 x i8], align 8 
check:22'0                                                        X error: no match found
check:22'1                                                          with "SMALL" equal to "0"
check:22'2                                                          with "BIG" equal to "1"
            8:  
check:22'0     ~
            9: ; Function Attrs: nounwind uwtable 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           10: define { ptr, ptr } @fun() unnamed_addr #0 { 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           11: start: 
check:22'0     ~~~~~~~
           12:  %0 = alloca [8 x i8], align 8, addrspace(5) 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           13:  %1 = addrspacecast ptr addrspace(5) %0 to ptr 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           14:  %2 = alloca [8 x i8], align 8, addrspace(5) 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           15:  %3 = addrspacecast ptr addrspace(5) %2 to ptr 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:  store ptr addrspacecast (ptr addrspace(3) @0 to ptr), ptr %3, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           17:  %small = load ptr, ptr %3, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           18:  store ptr addrspacecast (ptr addrspace(3) @1 to ptr), ptr %1, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           19:  %big = load ptr, ptr %1, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           20:  %4 = insertvalue { ptr, ptr } poison, ptr %small, 0 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           21:  %5 = insertvalue { ptr, ptr } %4, ptr %big, 1 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           22:  ret { ptr, ptr } %5 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~
           23: } 
check:22'0     ~~
           24:  
check:22'0     ~
           25: attributes #0 = { nounwind uwtable "target-cpu"="gfx900" } 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           26:  
check:22'0     ~
           27: !llvm.module.flags = !{!0} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           28: !llvm.ident = !{!1} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~
           29:  
check:22'0     ~
           30: !0 = !{i32 8, !"PIC Level", i32 2} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           31: !1 = !{!"rustc version 1.95.0-nightly (42a3a3fcb 2026-02-05)"} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

------------------------------------------

error in revision `amdgpu`: verification with 'FileCheck' failed
status: exit status: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll" "/checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs" "--check-prefix=CHECK" "--check-prefix" "amdgpu" "--allow-unused-prefixes" "--dump-input-context" "100"
stdout: none
--- stderr -------------------------------
/checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs:22:12: error: amdgpu: expected string not found in input
// amdgpu: ret { ptr, ptr } { ptr addrspacecast (ptr addrspace(3) @[[SMALL]] to ptr), ptr addrspacecast (ptr addrspace(3) @[[BIG]] to ptr) }
           ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll:7:52: note: scanning from here
@1 = external addrspace(3) global [0 x i8], align 8
                                                   ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll:7:52: note: with "SMALL" equal to "0"
@1 = external addrspace(3) global [0 x i8], align 8
                                                   ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll:7:52: note: with "BIG" equal to "1"
@1 = external addrspace(3) global [0 x i8], align 8
                                                   ^

Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.amdgpu/gpu-launch-sized-workgroup-memory.ll
Check file: /checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = 'gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0' 
            2: source_filename = "gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0" 
            3: target datalayout = "e-m:e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9" 
            4: target triple = "amdgcn-amd-amdhsa" 
            5:  
            6: @0 = external addrspace(3) global [0 x i8], align 4 
            7: @1 = external addrspace(3) global [0 x i8], align 8 
check:22'0                                                        X error: no match found
check:22'1                                                          with "SMALL" equal to "0"
check:22'2                                                          with "BIG" equal to "1"
            8:  
check:22'0     ~
            9: ; Function Attrs: nounwind uwtable 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           10: define { ptr, ptr } @fun() unnamed_addr #0 { 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           11: start: 
check:22'0     ~~~~~~~
           12:  %0 = alloca [8 x i8], align 8, addrspace(5) 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           13:  %1 = addrspacecast ptr addrspace(5) %0 to ptr 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           14:  %2 = alloca [8 x i8], align 8, addrspace(5) 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           15:  %3 = addrspacecast ptr addrspace(5) %2 to ptr 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:  store ptr addrspacecast (ptr addrspace(3) @0 to ptr), ptr %3, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           17:  %small = load ptr, ptr %3, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           18:  store ptr addrspacecast (ptr addrspace(3) @1 to ptr), ptr %1, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           19:  %big = load ptr, ptr %1, align 8 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           20:  %4 = insertvalue { ptr, ptr } poison, ptr %small, 0 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           21:  %5 = insertvalue { ptr, ptr } %4, ptr %big, 1 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           22:  ret { ptr, ptr } %5 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~
           23: } 
check:22'0     ~~
           24:  
check:22'0     ~
           25: attributes #0 = { nounwind uwtable "target-cpu"="gfx900" } 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           26:  
check:22'0     ~
           27: !llvm.module.flags = !{!0} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           28: !llvm.ident = !{!1} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~
           29:  
check:22'0     ~
           30: !0 = !{i32 8, !"PIC Level", i32 2} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           31: !1 = !{!"rustc version 1.95.0-nightly (42a3a3fcb 2026-02-05)"} 
check:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
------------------------------------------

---- [codegen] tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs#amdgpu stdout end ----
---- [codegen] tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs#nvptx stdout ----
------FileCheck stdout------------------------------

------FileCheck stderr------------------------------
/checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs:25:11: error: nvptx: expected string not found in input
// nvptx: ret { ptr, ptr } { ptr addrspacecast (ptr addrspace(3) @[[BIG]] to ptr), ptr addrspacecast (ptr addrspace(3) @[[BIG]] to ptr) }
          ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll:6:81: note: scanning from here
@gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8
                                                                                ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll:6:81: note: with "BIG" equal to "gpu_launch_sized_workgroup_mem"
@gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8
                                                                                ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll:6:81: note: with "BIG" equal to "gpu_launch_sized_workgroup_mem"
@gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8
                                                                                ^

Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll
Check file: /checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = 'gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0' 
            2: source_filename = "gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0" 
            3: target datalayout = "e-p6:32:32-i64:64-i128:128-i256:256-v16:16-v32:32-n16:32:64" 
            4: target triple = "nvptx64-nvidia-cuda" 
            5:  
            6: @gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8 
check:25'0                                                                                     X error: no match found
check:25'1                                                                                       with "BIG" equal to "gpu_launch_sized_workgroup_mem"
check:25'2                                                                                       with "BIG" equal to "gpu_launch_sized_workgroup_mem"
            7:  
check:25'0     ~
            8: ; Function Attrs: nounwind uwtable 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9: define { ptr, ptr } @fun() unnamed_addr #0 { 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           10: start: 
check:25'0     ~~~~~~~
           11:  %0 = alloca [8 x i8], align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           12:  %1 = alloca [8 x i8], align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           13:  store ptr addrspacecast (ptr addrspace(3) @gpu_launch_sized_workgroup_mem to ptr), ptr %1, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           14:  %small = load ptr, ptr %1, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           15:  store ptr addrspacecast (ptr addrspace(3) @gpu_launch_sized_workgroup_mem to ptr), ptr %0, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:  %big = load ptr, ptr %0, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           17:  %2 = insertvalue { ptr, ptr } poison, ptr %small, 0 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           18:  %3 = insertvalue { ptr, ptr } %2, ptr %big, 1 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           19:  ret { ptr, ptr } %3 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~
           20: } 
check:25'0     ~~
           21:  
check:25'0     ~
           22: attributes #0 = { nounwind uwtable "target-cpu"="sm_30" } 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           23:  
check:25'0     ~
           24: !llvm.module.flags = !{!0} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           25: !llvm.ident = !{!1} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~
           26:  
check:25'0     ~
           27: !0 = !{i32 8, !"PIC Level", i32 2} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           28: !1 = !{!"rustc version 1.95.0-nightly (42a3a3fcb 2026-02-05)"} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

------------------------------------------

error in revision `nvptx`: verification with 'FileCheck' failed
status: exit status: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll" "/checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs" "--check-prefix=CHECK" "--check-prefix" "nvptx" "--allow-unused-prefixes" "--dump-input-context" "100"
stdout: none
--- stderr -------------------------------
/checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs:25:11: error: nvptx: expected string not found in input
// nvptx: ret { ptr, ptr } { ptr addrspacecast (ptr addrspace(3) @[[BIG]] to ptr), ptr addrspacecast (ptr addrspace(3) @[[BIG]] to ptr) }
          ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll:6:81: note: scanning from here
@gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8
                                                                                ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll:6:81: note: with "BIG" equal to "gpu_launch_sized_workgroup_mem"
@gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8
                                                                                ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll:6:81: note: with "BIG" equal to "gpu_launch_sized_workgroup_mem"
@gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8
                                                                                ^

Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen-llvm/gpu-launch-sized-workgroup-memory.nvptx/gpu-launch-sized-workgroup-memory.ll
Check file: /checkout/tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = 'gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0' 
            2: source_filename = "gpu_launch_sized_workgroup_memory.cff6d99b5f4fe7c2-cgu.0" 
            3: target datalayout = "e-p6:32:32-i64:64-i128:128-i256:256-v16:16-v32:32-n16:32:64" 
            4: target triple = "nvptx64-nvidia-cuda" 
            5:  
            6: @gpu_launch_sized_workgroup_mem = external addrspace(3) global [0 x i8], align 8 
check:25'0                                                                                     X error: no match found
check:25'1                                                                                       with "BIG" equal to "gpu_launch_sized_workgroup_mem"
check:25'2                                                                                       with "BIG" equal to "gpu_launch_sized_workgroup_mem"
            7:  
check:25'0     ~
            8: ; Function Attrs: nounwind uwtable 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9: define { ptr, ptr } @fun() unnamed_addr #0 { 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           10: start: 
check:25'0     ~~~~~~~
           11:  %0 = alloca [8 x i8], align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           12:  %1 = alloca [8 x i8], align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           13:  store ptr addrspacecast (ptr addrspace(3) @gpu_launch_sized_workgroup_mem to ptr), ptr %1, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           14:  %small = load ptr, ptr %1, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           15:  store ptr addrspacecast (ptr addrspace(3) @gpu_launch_sized_workgroup_mem to ptr), ptr %0, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:  %big = load ptr, ptr %0, align 8 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           17:  %2 = insertvalue { ptr, ptr } poison, ptr %small, 0 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           18:  %3 = insertvalue { ptr, ptr } %2, ptr %big, 1 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           19:  ret { ptr, ptr } %3 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~
           20: } 
check:25'0     ~~
           21:  
check:25'0     ~
           22: attributes #0 = { nounwind uwtable "target-cpu"="sm_30" } 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           23:  
check:25'0     ~
           24: !llvm.module.flags = !{!0} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           25: !llvm.ident = !{!1} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~
           26:  
check:25'0     ~
           27: !0 = !{i32 8, !"PIC Level", i32 2} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           28: !1 = !{!"rustc version 1.95.0-nightly (42a3a3fcb 2026-02-05)"} 
check:25'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
------------------------------------------

---- [codegen] tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs#nvptx stdout end ----

failures:
    [codegen] tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs#amdgpu
    [codegen] tests/codegen-llvm/gpu-launch-sized-workgroup-memory.rs#nvptx

@rust-bors rust-bors bot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 5, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 5, 2026

💔 Test for 42a3a3f failed: CI. Failed job:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-compiletest Area: The compiletest test runner A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-testsuite Area: The testsuite used to check the correctness of rustc A-tidy Area: The tidy tool S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.