[DRAFT] Implement DPP Reduction in wavefront by stefankoncarevic · Pull Request #1796 · ROCm/rocMLIR

stefankoncarevic · 2025-03-31T12:39:18Z

This pull request addresses the implementation of wavefront reduction using DPP (Data Parallel Primitives) instructions. Currently, the implementation covers case that there aren't sufficient threads for a parallel reduction, allowing each individual thread to perform its own reduction.

This is the link to ticket: https://github.com/ROCm/rocMLIR-internal/issues/1524

Current Architecture Support
The code currently supports the MI200 and MI300 architectures. For RDNA and MI100 architectures, Need to adjust the conditions based on the architecture and select the appropriate intrinsic instructions supported by these specific architectures.

Considered Approaches
Kernel Pipeline Approach:
Began by implementing an option for the chipset in the -rock-blockwise-gemm-to-threadwise pass. This option would be available for determining conditions to implement the desired functionality. It involves adding this option to the kernel pipeline, and modifying the our tools rocmlir-driver -c to pass the architecture information for generating the correct instructions for mlir passes. Feedback is needed to assess whether it's beneficial to add this option to the kernel pipeline pass.

ROCDL Backend Approach:
Since the chipset is already integrated within the ROCDL dialect backend pipeline, and instructions are lowered in this pass, another approach would be to add custom intrinsics within AMDGPU.td. These can be used to create conditions within the ROCDL that lower the required instructions based on the chipset being utilized.

Generic Solution with Readlane:
A third potential solution is to attempt a more generalized approach by using readlane and modifying the logic to try to utilize as many common instructions supported across various architectures as possible. This would help in achieving a correct and optimal solution.

- Implemented 'row_share' as a new DPP instruction. - Added verification logic for 'row_share' with permissible range [0-15]. - Updated test cases to include 'row_share' examples and checks.

- Introduced ROCDL_SetInactiveOp to handle inactive lanes by mapping to the `llvm.amdgcn.set.inactive` intrinsic. This is crucial when the number of active lanes is less, ensuring values are managed appropriately. - Added ROCDL_StrictWWMOp to implement strict whole wavefront mode using the `llvm.amdgcn.strict.wwm` intrinsic. This operation ensures calculations consider the entire wavefront, integral to DPP reduction scenarios. These additions support complex wavefront operations needed during DPP (Data Parallel Primitives) reduction, enhancing control over lane execution and ensuring computation correctness in varying lane activeness scenarios.

- Implemented ROCDL_PermLaneX16Op to utilize `llvm.amdgcn.permlanex16` intrinsic, addressing limitations in DPP broadcasting on RDNA architecture. - This enhancement facilitates value propagation across lanes within a wave, providing flexibility where certain DPP instructions are unsupported for broadcasting on RDNA systems.

- Integrated DPP reduction functionality designed to improve performance on supported architectures by minimizing active lane usage through set inactive operations. - Added operations to manage scalar extraction from vectors with `vector::ExtractElementOp` for precise per-element manipulation and enhanced control over lane values. - Introduced `ROCDL::SetInactiveOp` to replace values in inactive lanes, promoting efficiency when active lane count is reduced in DPP scenarios. - Enabled thorough reduction steps through sequential DPP operations including `row_shr`, broadcasting (`row_bcast_15`, `row_bcast_31`), and wave rotations (`wave_ror`) for wavefront manipulation. - Utilized `ROCDL::StrictWWMOp` to ensure computations occur in Whole Wavefront Mode, particularly crucial in scenarios where the number of active lanes is fewer than the wave size. This guarantees consistent and correct results across all lanes within the wavefront, meeting the necessary computational requirements for comprehensive DPP reductions.

Replaced the previous DPP-based broadcasting logic with ROCDL::ReadlaneOp, allowing thread 0 to extract the final reduction result and share it with all threads efficiently. This simplifies the implementation and ensures correctness across different architectures. Removed multiple amdgpu::DPPOp operations (row_share, row_bcast_15, row_bcast_31) used for broadcasting the reduced value. Directly used ROCDL::ReadlaneOp to read the result from thread 0 and distribute it to all threads. Ensures correct behavior across different AMD GPU architectures. Potentially improves performance by using a single lane-read operation instead of multiple DPP-based broadcasts.

krzysz00 · 2025-04-01T16:17:40Z

Happened to notice this PR and wanted to let y'all know that llvm/llvm-project#133204 is a thing that's under development and might be useful.

dhernandez0 · 2025-04-02T12:35:34Z

-                  loc, reduced, workspaceLDSBuffer,
-                  reductionLoop.getLowerCoords(/*domain=*/2));
+
+            Value BrodcastAll;


I think we don't need to have rocdl and amdgpu dialects here. Can we have some new op rock::wavereduction or something like that? Then, we can lower it to rocdl later.

also, that way it's easier to keep the current implementation if dpp is not supported.

dhernandez0 · 2025-04-02T13:10:37Z

+                    rewriter.create<arith::ConstantIndexOp>(loc, i));
+                Value scalarInactiveValue = rewriter.create<arith::ConstantOp>(
+                    loc, vecType.getElementType(),
+                    rewriter.getFloatAttr(vecType.getElementType(), 0.0));


for max reduction it should be -inf?

dhernandez0 · 2025-04-02T13:23:47Z

+                    loc, vecType.getElementType(), scalarVal,
+                    scalarInactiveValue);
+
+                Value dppResult1 = rewriter.create<amdgpu::DPPOp>(


please add comments to understand what the constants are doing here: 0xF, ...

dhernandez0 · 2025-04-02T13:23:52Z

+                Value dppResult = createReducingOp(op, setInactiveScalar,
+                                                   dppResult1, rewriter);
+
+                Value dppResult2 = rewriter.create<amdgpu::DPPOp>(


please add comments to understand what the constants are doing here: 0xF, ...

- Introduced `rock-wave-reduce-lowering` pass to lower `rock.wave_reduce` ops. - Implemented `RockWaveReduceRewritePattern` using DPP row_shr shifts and broadcasts for intra-wavefront reductions, with fallback to `permlanex16` for gfx10+. - Extended `rocmlir-driver` to pass the detected arch to kernel pipeline so the lowering can apply chipset-specific logic.

Copilot

Pull Request Overview

This PR introduces a wavefront-level reduction operation (rock.wave_reduce) backed by AMDGPU DPP intrinsics and integrates it into the MLIR GPU pipeline.

Add a --chip driver option and thread pipeline flag to control target GPU architecture.
Define the rock.wave_reduce op, implement RockWaveReduceLoweringPass, and register it in the kernel pipeline.
Update blockwise reduction logic to use the new wave-level reduction and extend DPP/ROCDL dialects for new intrinsics.

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
mlir/tools/rocmlir-driver/rocmlir-driver.cpp	Pass the new `--chip` option into the kernel pipeline
mlir/test/rocmlir-driver/pipelines.mlir	Add `rock-wave-reduce-lowering` pass invocation with `chipset` parameter
mlir/test/Dialect/Rock/lowering_blockwise_broadcast_reduce.mlir	Update CHECKs to expect `rock.wave_reduce`
mlir/lib/Dialect/Rock/Transforms/RockWaveReduce.cpp	Implement the lowering pass for `rock.wave_reduce` and DPP-based reduction
mlir/lib/Dialect/Rock/Transforms/CMakeLists.txt	Register the new `RockWaveReduce.cpp` in the build
mlir/lib/Dialect/Rock/Transforms/BlockwiseGemmToThreadwise.cpp	Use `rock.wave_reduce` in blockwise reductions
mlir/lib/Dialect/Rock/Pipelines/Pipelines.cpp	Insert the `RockWaveReduceLoweringPass` into the kernel pipeline
mlir/include/mlir/Dialect/Rock/Pipelines/Pipelines.h	Add `chip` option to `KernelOptions`
mlir/include/mlir/Dialect/Rock/Passes.td	Declare the `rock-wave-reduce-lowering` pass and its `chipset` option
mlir/include/mlir/Dialect/Rock/Passes.h	Enable `GEN_PASS_DECL_ROCKWAVEREDUCELOWERINGPASS`
mlir/include/mlir/Dialect/Rock/IR/RockOps.td	Define the new `Rock_WaveReductionOp`
external/llvm-project/mlir/test/Conversion/AMDGPUToROCDL/dpp.mlir	Add a test for the new `row_share` DPP permutation
external/llvm-project/mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp	Extend `DPPOp::verify()` for the `row_share` case
external/llvm-project/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp	Lower `row_share` DPPPerm in the AMDGPU-to-ROCDL conversion
external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td	Add new ROCDL intrinsics (`set.inactive`, `strict.wwm`, `permlanex16`)
external/llvm-project/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td	Add `row_share` to the `DPPPerm` enum

Comments suppressed due to low confidence (2)

mlir/lib/Dialect/Rock/Transforms/RockWaveReduce.cpp:106

The defaultVal passed to SetInactiveOp is a vector but scalarVal is an elementType; inactive_value should match src type. Extract the corresponding element from defaultVal (e.g., via vector::ExtractElementOp) to supply a scalar inactive value.

Value setInactiveScalar = rewriter.create<ROCDL::SetInactiveOp>(loc, vecType.getElementType(), scalarVal, defaultVal);

mlir/tools/rocmlir-driver/rocmlir-driver.cpp:176

[nitpick] The driver option is named chip but the pass option uses --chipset. For consistency, consider renaming chip to chipset in KernelOptions.

opts.chip = devName.getChip().str();

Copilot · 2025-06-11T23:05:41Z

+          loc, vecType.getElementType(), scalarVal, defaultVal);
+      std::array<int, 5> row_shifts = {1, 2, 3, 4, 8};
+      Value dppResult = setInactiveScalar;
+      Value BrodcastAll;


[nitpick] Variable name BrodcastAll is misspelled; consider renaming to BroadcastAll for clarity.

Suggested change

Value BrodcastAll;

Value BroadcastAll;

stefankoncarevic · 2026-03-18T09:47:26Z

This is deprecated, new PR open: #2301

…fd2a 36e42eaffd2a merge main into amd-staging (#1842) be6a92d605e6 merge main into amd-staging 076226f378df [ELF] Separate relative and non-relative dynamic relocations (#187959) 5567572c44a0 [clang-tidy][NFC] Remove optimized container implementations in `misc-no-recursion` (#187630) 6dabcef0b3ff [MLIR][IRDL][Python] Fix error while composing `irdl.any_of` and `irdl.base` (#187914) 7482655a6b57 [clang] On Windows, silence warning when building with MSVC (#187937) e1286d963eeb [mlir] Deterministic containers in BytecodeWriter (#187819) f1bc5a216026 merge main into amd-staging (#1841) 98f84f9bf2df [clangd] Code completion for declaration of class method (#165916) ce288f444102 [MLIR][XeGPU] Add distribution patterns for vector insert & extract ops in sg to wi pass (#184665) 8d64f56892b5 [Clang] Honour [[maybe_unused]] on private fields (#187940) b2ba79578b35 [SLP]Fix patterns for compile time blow up with ordered reductions c079372099cf [VPlan] Add m_VPPhi pattern matcher and use in removeDeadRecipes (NFC). 651482267dc2 [gn] port a2c0c436999 0e7a8ac6c15d [gn build] Port c6ba0e00161e c58f32215d49 [gn build] Port 78729251fbb2 49b5d35f9d71 merge main into amd-staging ea489fe12ee9 [llvm][ADT] Add PointerUnion benchmarks. NFC. (#187874) ca3d04561ba8 [llubi][Github] Add llubi labelling support (#187877) 45039dfac472 [clang] On Windows, silence warning in `CFGBackEdgesTest` with MSVC (#187939) 33a14cbe35c1 [RISCV] Add guard to prevent GPRPair merge on targets without Zdinx or P (#186600) 88f830aed8ac [SLP]Do not try to reduced instruction, marked for deletion in previous attempts 34bc5d580b73 [MC,clang] Fix -Wa,--noexecstack not emitting .note.GNU-stack (#187880) 2566961cefad [clang-tidy] use-after-move: Support null_after_move annotations (#186903) 6eb5ac52ca56 [SystemZ] Remove custom lowering of f16 IS_FPCLASS (#187532) b1cf9b0835d2 [Clang] Support constexpr for AVX512 compress intrinsics (#187656) 7a0c3401ffbf merge main into amd-staging (#1839) 26ee552ef583 [libc++] Fix includes in `<string>` for no-`wchar_t` modes (#187650) 720abd76e71f [clang][AST] Fix assertion in getFullyQualifiedType for DecltypeType (#187725) 4c4925f1a259 [LLD] [ELF] Make {bti,gcs}-report=none silence warnings from force-bti/gcs=always (#186343) b4084bd21322 [clang] Detect pointee mutations in placement new expressions (#187508) a0d550856342 [X86] Prefer branchless code with sbb for abdu (#187783) 997597d20dd3 merge main into amd-staging 5324c23d6c4e [llvm][DebugInfo] Use formatv instead of format in DWARFDebugLoc (#186800) 5b71607f384e [mlir][linalg] Specialize transform op - emit category ops (#187506) a5472086ce24 [clang-tidy] False negatives readability-redundant-parantheses member of struct (#187054) 24546d96445a [clang][CodeGen] Use FieldDecl::getFieldIndex() in VisitOffsetOfExpr (#187826) 81b7a5daef22 merge main into amd-staging (#1837) 6f6adfbca46c [orc-rt] Add Session::tryCreateService convenience function. (#187640) 3258d361cbc5 [Clang] Use stable_sort in VerifyDiagnosticsConsumer. (#187827) a9deba8f5c17 merge main into amd-staging f014202dac32 [clang-format][NFC] Remove redundant parens enclosing braced list 2bbb4491a6fd [clang-format] Merge case alignment into AlignTokens (#187811) aa62224a73b1 [libc][math] Refactor sqrtbf16 function header-only (#187849) 1120c971c849 [NFC][AMDGPU] Set output to null for llvm/test/MachineVerifier/AMDGPU/invalid-vop3-source-modifiers.mir (#187888) 09ca32edeefa merge main into amd-staging (#1836) eea589f951e1 [libc][math] Qualify log with constant evaluation support (#184745) 1f1d3167b203 [clang-format] Correctly annotate Java lambda/sychronized blocks (#187842) adcb17b47641 [Clang][HLSL] Fix -Wunused-variable 2be28d65fbf1 [gn] "port" 0ec9f7ebbdf72a94 1075a2fa7ed0 [Instcombine] Write Instcombine pass to strength reduce lock xadd to lock sub (#184715) d3c7f23d2f3f [VPlan] Remove unused VPlan friend declaration from VPValue (NFC) 2702570686d7 [Bazel] Port 0ec9f7ebbdf72a94e0e2c0872d4d11fc8d0efa6b b7d97d9e8dfa [BOLT] Remove outdated assertion from local symtab update logic (#187409) 6204bc78fdc5 merge main into amd-staging f146677396ea [TargetLowering] Refactor expandDIVREMByConstant to share more code. NFC (#187582) 34203a53b60d [RISCV][Docs] Removed 'specified in' text from SiFive custom instruction links. NFC (#187817) df9eb79970c0 [Clang][AMDGPU] Lower `__amdgpu_texture_t` to `<8 x i32>` instead of ptr adrspace(0) (#187774) d818fa4c55c2 [MLIR][Python] Make init parameters follow the field definition order (#186574) 93d256b076b1 [llvm-pdbutil] Hash type records in yaml2pdb (#187593) c1df6937bac1 [TargetLowering] Use legally typed shifts to split chunks in expandDIVREMByConstant. (#187567) 7d7cd745af22 [libc][math][c23] Add atanbf16 function (#184019) 82eee26ccccd [libclang/python] Fix Type.get_offset annotation (#187841) 4d058aeb08ca [lldb] Fix LLVMSupportHTTP linkage against libLLVM (#187848) a2c37b3d474d merge main into amd-staging (#1834) 616240369e11 [SLP]Do not consider copyable node with SplitVectorize parent acf9eede2302 [Object] Fix issues in BBAddrMap.h (#187704) 656fce889c65 [libclang/python] export libclang version to the bindings (#86931) 3b9106149c68 [lldb] Fix linking liblldb in a dylib build after 39d6bb21804d21ab db143fb2b9e6 [SLP][NFC]Use block number instead of pointer for stable sorting, NFC 2d01df187f1c [CIR] Fix reference alignment to use pointee type (#186667) 43db30f5b223 merge main into amd-staging 1f9c54a15a87 [clang][AST] Preserve qualifiers in getFullyQualifiedType for AutoType (#187717) b3c1098211c8 [lldb][test] PlatformTest: fix comment 64c4e529a956 [mlir][SPIRV] Add alignment calculation to support `PhysicalStorageBuffer` with vector types (#187698) ac84b01655fa merge main into amd-staging (#1832) 1087db5b1f80 [clang-tidy] Speed up `bugprone-suspicious-semicolon` (#187558) d7fe9c87ce9a [clang-format] Handle sizeof in C compound literals (#187633) f0a652e351d8 [NFC][clang] Remove dead code in HandleCXXModuleDirective (#187737) 91dbaf5d89c0 [NVPTX] Print param space sub-qualifiers where supported (#187350) a4ddd8faf8fc merge main into amd-staging 4cba4ad8d73b AMDGPU/GlobalISel: RegBankLegalize rules for pops_exiting_wave_id (#187778) 36798cfa860f [AMDGPU][GlobalISel][NFC] Change mbcnt test to use new-reg-bank-select (#187772) 8eccc21e47fb [libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190) 2b47497f6e0a [libc][docs][NFC] Restructure Getting Started guide and update Build Concepts. (#187701) b9d276748eca [lldb] Support arm64e Objective-C signing in the expression evaluator (#187765) 3d4dd4b94418 [lldb][test] Don't build PlatformLocateSafePathTest in non-asserts builds (#187829) 367da15a11c5 [MLIR][XeGPU] Enhance XeGPU lane layout to support "wrap-around" distribution (#186958) 0ae9aaf53973 [InstCombine] When rewriting GEPs, check that the types match. (#186886) 0ec9f7ebbdf7 [lldb] Add mechanism for auto-loading Python scripts from pre-configured paths (#187031) 6891a6ef0e24 [compiler-rt] Add bitmask to fix warning (#187812) e1f885549095 Manual update of LLVM_MAIN_REVISION to 573671 bae6a2a76b63 [Runtimes] Fix Unix Makefiles race between runtimes-build and EXTRA_TARGETS (#187634) 335a2d0e7e5a [clang] fix error: cannot compile this l-value expression yet (#187755) 368f38b9fcfe [AMDGPU][SIInsertWaitcnts][NFC] SGPRInfo: Move score selection logic closer (#186518) a60b3a83cf32 [libc] Fix function prototypes for <threads.h> C11 header. (#187808) cf5f47b270b7 merge main into amd-staging (#1829) 51c158b5d29b [clang-doc] Fix incorrectly rendered specialization args in HTML (#187761) 720615f49586 [SandboxVec][DAG] Fix DAG update when user is scheduled (#187148) 8cc0124508a5 [scudo] Make the default for size/align checks to not die. (#187799) 7a5431eee3c2 [lldb][bytecode] Fix Update() and failing test (#187795) 485eda9cb32e [mlir][tosa] Fix crash in slice op folder when input values are not iterable (#187339) a2615482d178 [libc][annex_k] Add constraint_handler_t. (#163239) d514a4aa6a25 [HLSL][Matrix] Support row-major `transpose` and `mul` by inserting matrix memory layout transformations (#186898) 2b78c71cb5aa [lldb] Support -fptrauth-indirect-gotos in the expression evaluator (#187562) 79f3104e09e8 [lldb] Fix warning style for SymStore symbol locator (#187776) 343b566b5746 [TargetLowering] Move the MULH/MUL_LOHI legality checks to the beginning of BuildSDIV/UDIV. NFCI (#187780) 3eecb98b3702 [TargetLowering] Separate some of the control for the i32->i64 optimization out of BuildUDIVPattern. (#187739) 2d3b8cecfbcf [dsymutil] Require AArch64 backend in asm-line-tables.test (#187797) a99dbc5be598 [HLSL] Add binding attributes to resources from structs (#184731) 9b30151594a7 [lldb] Support PointerAuthAuthTraps in the expression evaluator (#187612) 82f18b02d9fe [Clang] Rename OffloadArch::UNUSED to UNUSED_ to avoid macro collisions (#174528) 8c49c53a7f38 [dsymutil] Preserve DWARF debug info for assembly files (#187575) b260861b388e [SLP]Update values after ordered vectorization 44c6a0acb70f [MLIR][XeGPU] Fix dpas f16 output layout (#184419) 1b44e34b18e8 [scudo] Add free_sized and free_aligned_sized (#186881) 66f06f54cb4d [mlir][acc] Sink constants into acc.compute_region when creating (#187777) bd3b06b0a797 [AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn.class (#178827) bb369f1c30a1 [libc][x86] Add Non-temporal code path for large memcpy (#187108) 827ddb205144 [AMDGPU][SIInsertWaitcnts] Add test functions in waitcnt-wcg-attributes.mir (#186504) dd30239f3315 [AMDGPU] Add basic verification for source modifiers (#186733) 498dd13f7228 Add VDS encoding for gfx13 (#187693) 950eaaabe318 [Clang] Use stable_sort for UnqualUsingDirectiveSet for determinism in ambiguity notes (#187750) cfc94a6fd7fa [flang][OpenMP] Introduce `WithReason<T>` for nest/sequence properties (#187563) 78b651a2cba9 [RISCV] Fix the pipe used by `fmv.x.<fp>/<fp>.x` in SiFive7 sched model (#187740) 63c9573f1c6b [LSR] Add regression test for unnecessary phi introduction (#187751) 9f76933b07fa merge main into amd-staging 9431920bfaee [llvm] Silence llvm-debuginfod-find/headers-winhttp.test on Windows bots temporarily (#187753) 07896d44a320 [OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261) 60db764b90b1 [Bazel] Port a2c0c43699917bb26a3eb20fefcbf29ff120ce70 0ec6e1d21e83 [CIR] Address Space support for GlobalOps (#179082) 4a5da64759ae [CIR][NFC] Minor cleanups to missing feature markers (#187754) bc6a265e3be2 [offload] Use flang-rt for test feature requirements (#187733) eaeca6d2fb88 [clang] fix #187352 breakage on 32-bit platforms (#187741) 94e366ef2060 [SLP] Initial support for ordered reductions 5717524c02c2 [mlir][acc] Use index for acc.par_width results (#187734) 65d84ea12725 [lldb] Update the calling convention of BytecodeSyntheticChildren::Update (#182155) d8e1f50780e1 [lldb][bytecode] Document invocation in compiler output (#187547) e835f8b687f6 [flang][OpenACC] Fix reduction init value for minnumf/minimumf/maxnumf/maximumf (#187647) 965ee6c91f7f [FIRToMemRef] copy ACC Variable Name attribute (#187724) 537a8cc745ed [IR][NFC] Fix MSVC deprecation warnings about BranchInst (#187702) 5a14e4f231bb [flang] Implement SPLIT intrinsic subroutine with tests (#185584) 68a9e9ca3e93 [GlobalISel] Add G_ABDU and G_ABDS to computeKnownBits. (#186822) a2c0c4369991 [clang][ssaf] Consolidate tools and shared utilities under `clang/tools/clang-ssaf/` ca54948d0b7a [ASan] Fix missed poisoned suffix in first granule in __asan_region_is_poisoned (#187466) 81e3364e7608 [OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#1810) d7dbba55bff5 [llvm] Run headers-winhttp.test only if the Python side of it works (#187727) 18f7e625bd78 Revert "[AMDGPU] Generate more swaps" (#187723) 98eaa95baeb7 [flang][OpenMP] Store bad ExecutionPartConstruct in LoopSequence (#187556) 97a1a7020881 [SPIR-V] Fix SPV_INTEL_long_composites continued instruction handling (#187262) c3e7b4556e40 [offload] Define flang-rt as an available test feature (#187732) d8104bfc9e9d [SPIR-V] Add `llvm.spv.named.boolean.spec.constant` (#187420) bf57f910abbe [Clang] Fix -Wunused-variable e1347d197723 [SPIR-V] Fix linker error after #187685 (#187722) 063109f758aa [NewPM] Adds a port for AArch64MIPeepholeOpt (#187515) bb070ea56b3e [SPIRV][NFC] Update `SPV_INTEL_function_pointers` tests to check `spirv-val` output (#182549) 93d7583f4f2d [AMDGPU] Update features for gfx1170 (#186107) 2bb0fa46a816 [SLP]Prefer copyable over alternate 20768a957654 [ACC] Use ExistingOps strictness in ACCSpecializeForDevice for non-specialized functions (#187645) 6d45f6dfb7e1 [clang-tidy] Generate valid JSON for characters that require escaping (#187454) 8cbf3f3c393d [GlobalISel] Fold `a bitwiseop (~b +/- c)` -> `a bitwiseop ~(b -/+ c)` (#181725) a2446e25f05a reland 2 SLP PRs (#1812) a6cc2f5e07b8 [AMDGPU] Remove `_e32` patterns for i32 saturated conversions (#187715) c6ba0e00161e [clang][ssaf] Add whole-program analysis execution layer 0d251db0f1d9 [NVPTX] Split NVVM annotation query helpers out of NVPTXUtilities (NFC) (#187349) 376907e09303 [NFC][clang][analyzer] Complete enumeration in emitCrossTUDiagnostics (#187646) f064a9979ff5 [DAGCombine] Optimize away cond ? 1 : 0 post-legalization (#186771) fa49ad564bae [libc++] Fix random_shuffle signature in C++03 mode with frozen headers (#186443) 0506c03802b1 AMDGPU/GlobalISel: RegBankLegalize rules for readlane, writelane (#187386) 7cc46928d9d3 [ARM] Add a phase ordering test for multiple reductions. NFC 702582182d4d [llc] Flatten SkipModule branch and sink defs to their use(NFC) (#187661) f58b6754a1f5 [clang][ModulesDriver] Fix build failure with Xcode 14 (#187713) f5e2238a3e14 [MLIR][XeGPU] Enhance multi-reduction layout propagation rules (#186308) 4c60d48e1bbc [clang] Don't dllexport inherited constructors with -fno-dllexport-inlines (#187684) a3db68a97b2c [llvm] Restrict llvm-debginfod-find test to localhost to fix winhttp case (#187705) d339d0053ce2 [gn] port a021a93e5320d8 17d2890e7e41 [gn] port 7bf871c39f739 4a1e9f73103f [CIR] Make the -save-temps flag emit .cir and .mlir files (#186814) 40a6180cea1b [Flang] - Fix AliasAnalysis to preserve Allocate source kind through box loads (#187152) 646c4a00ee38 merge main into amd-staging (#1823) 78729251fbb2 [SPIR-V] Fix isAggregateType function implementation (#187685) 6feced2a7cc2 Fix select-best-vf-tripcount.ll buildbot failure 52452aa447d2 [CFG] Support CycleInfo in isPotentiallyReachable() (#187681) a70419505471 [AMDGPU] Shrink S_MOV_B64 to S_MOV_B32 during rematerialization (#184333) 9ab77fa8daf2 [gn] port aa3465793a250 838354d9d805 [X86] Use GFNI for vXi8 per-element shifts (#89644) d3a8666ec82d [mlir][spirv] Add reduction ops in TOSA Ext Inst Set (#187278) bd3ba6042b3b [AMDGPU] Remove unused forward declaration of GCNSubtarget (#187695) e3959a9c502e [NFC][Object] Move BBAddrMap related types to a shared header (#187268) 35ebb8c37077 [AMDGPU] Saturate at i16 for f16 to i1/i8 conversion (#187467) da8d0ab20287 [flang][NFC] Converted five tests from old lowering to new lowering (part 36) (#187628) 19b0c68ee012 [VPlan] Skip epilogue vectorization if dead after narrowing IGs. (#187016) 2600c723e101 [libc][NFC] Fix typo in file.cpp (#91192) (#187688) a6a34333a113 [analyzer] Don't rule out symbolic pointer pointing to stack (#187080) bdc8d9293d28 [OFFLOAD] Add GPU wrappers for headers currently supported by SPIRV built libc (#181913) 1dfd268f1008 [VPlan] Simplify mul x, -1 -> sub 0, x (#187551) b6accfa0b46f [LV] Regen induction-ptrcasts test with UTC (NFC) (#187678) 39d6bb21804d [lldb] Add HTTP support in SymbolLocatorSymStore (#186986) 22f5b8db125c libclc: Update acos (#187666) 43b4028d54e9 [RevPat] remove SLPs 6a29e02059b1 Merge branch 'amd-staging' into amd/dev/rlieberm/relandSLP a021a93e5320 Revert "Reapply [clang][analyzer] Format macro expansions" (#186614) f5b00daaae86 merge main into amd-staging 214bc4db9cc9 [X86][AVX10.2] Canonicalize narrow FP_TO_{S,U}INT_SAT (#186786) 277bd13cc6fc [analyzer] Fix logic in CallEvent::getReturnValueUnderConstruction (#187020) 172c0bbfbd56 [clang-tidy] Fix alphabetical order check for multiline doc entries and whitespace handling (#186950) 66bc5652bde0 [BAZEL] Add missing affine python enum gen (#187669) 21f439f13250 [LoopRotate] Use SCEV exit counts to improve rotation profitability (#187483) 14de6dafee4d [SPIR-V] Support global variable annotations in llvm.global.annotations (#187241) e6789f94b9cb [AMDGPU] Introduce ASYNC_CNT on GFX1250 (#185810) 895c281515fb [AArch64][GlobalISel] Remove fallback for scalar usqadd/suqadd intrinsics (#187513) 4376bf27c1ef [clang-tidy] Fix "effective" -> "efficient". (#187536) 4b17135d14aa [LV] Simplify `matchExtendedReductionOperand()` (NFCI) (#185821) 78f267f01d93 Reapply "[clang][bytecode] Allocate local variables in `InterpFrame` … (#187644) ab28384cd3e1 [ExpandMemCmp] Remove unused TM/TLI dependency (#187660) d97adc4314a6 [X86] Perform i128/i256/i512 BITREVERSE on the FPU (#187502) 689afb5ecd15 Windows release build: Add checksum verification for downloaded source archives (#187113) 69cd746bd2f1 [llc] Add -mtune option (#186998) 4df296733da9 [lldb] Implement llvm::formatv overload for Stream::operator << (#187462) facc82de4f21 [clang][cir] Adding myself in CODEOWNERS for CIRGenBuiltinAArch64.cpp (#187570) c8dd82916bc3 libclc: Override cbrt for AMDGPU (#187560) edbe8277c104 libclc: Use log intrinsic for half and float cases for amdgpu (#187538) a5de509e4ecd libclc: Rewrite log implementation as gentype inc file (#187537) 441790b31f06 [AArch64] Use an unknown size for memcpy ops with non-constant sizes. (#187445) 421bf13e4bf1 libclc: Update trigpi functions (#187579) a971089cb817 [LV] Explain why a less profitable VF was chosen (NFCI) (#187469) 7f8e23613685 libclc: Implement sin and cos with sincos (#187571) 090c40545f0f libclc: Replace flush_if_daz implementation (#187569) 5599d6018758 [Coroutines][NFC] Elide coro.free based on frame instead of coro.id (#187627) a4f97f0d9019 [CIR][AMDGPU] Add module flags for AMDGPU target using amendOperation of CIRDialectLLVMIRTranslationInterface (#186073) e42c16f12de2 [NFC][clang] Add HandleModuleName to avoid redundant module name handling code (#184014) 416935e29f5a [orc-rt] Add BootstrapInfo argument to ControllerAccess::connect. (#187635) 2a10df7dadf4 merge main into amd-staging (#1820) f5e28768dc7b [clang-format] Fix an AllowShortNamespacesOnASingleLine bug (#187451) 984417de3f37 [lldb][doc] Add cross-compilation guide for FreeBSD (#186216) 2bef931c4349 Reland "[flang][openacc] Prevent SimplifyArrayCoorOp from folding rebox used by ACC data entry ops (#187616)" (#187626) c3f381ccfe4b [mlir-python] Fix duplicate EnumAttr builder registration across dialects. (#187191) fa2df7e853d7 [flang][NFC] Converted five tests from old lowering to new lowering (part 35) (#187407) caf079fbb2f7 [clang][headers][endian.h] add some common extensions (#187565) 19ced5ad8248 [orc-rt] Redesign Session to provide a clearer lifecycle. (#187496) 2e88fe7021ad Revert "[flang][openacc] Prevent SimplifyArrayCoorOp from folding rebox used by ACC data entry ops" (#187625) bf1db77fc87c Revert "[clang][bytecode] Allocate local variables in `InterpFrame` tail storage" (#187410) 3012c60d227a merge main into amd-staging 89d8fe9d08c3 [LoongArch] Ensure .dwo sections do not contain relocations (#187429) 828da6176c49 [M68k] Fix pipeline.ll test after CodeGenPrepare analysis change (#187617) 979048e04dc6 [flang][openacc] Prevent SimplifyArrayCoorOp from folding rebox used by ACC data entry ops (#187616) bf748ea653d3 [offload] - Remove standalone build in favor of 'runtimes' (#1817) f08b0cc1f67b merge main into amd-staging (#1819) 5881ce66b121 [Bazel] Port 7efcd6198c8d15a1ab2ae2b30a3aa8b5168ef9cc 0e53fbcc43b5 [Bazel] Port 81e8a1e59ee28e5403d8a78874c37f853d18d4fb 9e54ca12e91c [AtomicExpandPass] Remove AtomicExpandUtils.h (NFC) (#187609) edf0fb81326a [GISEL][NFC] Remove a useless assert in constrainSelectedInstRegOperands (#187592) 366da1252b2c [libclc] Restore previous generic fmod implementation (#187470) df85f45bf1c1 [libc][docs][NFC] Add Build Concepts and consolidate patterns (#187490) 2f076c383ebb [libclang/python] Deprecate _CXUnsavedFile, introduce UnsavedFile instead (#187412) 3991dcbbd55d [gn] port 81e8a1e59ee28e54 8ccda467e100 [gn] port 02451f54d642ae 8771fd92f641 [gn build] Port d18a784d4106 7a544217fe54 [gn build] Port b17db271d030 d7dc03bc5bde [gn build] Port 4f298d4efa2f 13d99cecf03c [gn build] Port 39b6a4d84ad8 efc11383c9be [CIR] Add lowering for bool attributes (#187590) 9eb852c0627b [CIR] Implement global variable replacement in global view (#186168) 3fdc82c9ab13 [flang][openacc][cuda] Fix order of clause processing for host_data directive (#187600) 0e605742be6d [raw_socket_stream] Fix a file descriptor leak when connect failed (#187574) 537e7b586e15 [clang][DependencyScanning] Fix misplaced Driver includes (NFC) (#187599) ec3a7192cc7f [X86] bitreverse.ll - add additional i128/i256/i512 GFNI test coverage for #187502 (#187552) cbab7e65a720 [AMDGPU] Minor cleanups in offload plugin and AMDGPUEmitPrintf. NFC. (#187587) 61b9fc1d4225 [CIR] Upstream CUDA mangling test with LLVM and OGCG verification (#184444) 9cb1e372ddfa [Clang][AMDGPU] Minor driver cleanups. NFC. (#187586) 7efcd6198c8d [libc] Modular printf option (float only) (#147426) 4e19eee8a61a [NFC] Annotate CommentFlag with underlying type (#186560) e895a80b5d65 [lldb][TypeSystem] Add CompilerType::IsMemberDataPointerType (#187172) 59bc629bf37a [AMDGPU] Fix decoding of SETREG MSBs (#187578) 33cfe2843b58 [DirectX] Fix TypedBuffer load shader flag mismatch (#187393) f276ad429091 [VPlan] Make sure Inductions outlive returned VPlan. 25f0d08bafa4 [clang] add x86_64 baremetal triple include search paths (#183453) 81e8a1e59ee2 [clang][modules-driver] Add dependency scan and dependency graph (#152770) 33f7d655a65a merge main into amd-staging b03e3d1c262c [lldb] Fix Python 2 prints in the docs (#187553) a9ca888fc4f4 [IR][CAPI] Mark LLVMIsABranchInst as deprecated (#187576) 10068ee79b1b Regen CodeGen/voidptr-vaarg.c c5aefc77534b [flang] Downgrade an overly strict error to a warning (#187524) 101799100b73 [X86][GISEL] Port X86PostLegalizerCombiner to npm (#182787) 964bf036a105 [RevPat] record layout revert 963848e93fb9 Revert "[Clang][CodeGen] Restore isEmptyFieldForLayout for empty class handling" 11b439c5c5a0 [DTLTO] Speed up temporary file removal in the ThinLTO backend (#186988) 9ae3077ae921 [clang][modules] Remove `Module::ASTFile` (#185994) 2632ffeab1b4 [libc][stdio] Fix standard streams in overlay mode. (#187522) adbb1227174f [libc] Implement iswprint entrypoint (#185251) 8e1e371561e7 [IR][NFC] Mark BranchInst as deprecated (#187314) ae6fbd0fb753 [mlir][linalg] Fix vectorizer generating invalid vector.gather for 0-D tensor.extract (#187085) 7d188c5118d2 merge main into amd-staging (#1815) fd3cf1c1604e [LV] Move dereferenceability check from Legal to VPlan (NFC) (#185323) 486bd960993d [libc++][NFC] Remove redundant guard for `__cpp_lib_destroying_delete` (#187473) 88cbac0495b4 [libc++] Unify python shebangs (#187258) b6543c98d7e4 [TargetLowering] Make sure LL/LH are always initialized in expandDIVREMByConstant 721775bf88e1 clang-c/Index.h: Fix typoed comment (#144219) 4df2725a2e05 [AMDGPU][AMDGPUBaseInfo] Replace Waitcnt members with array (#182927) f66bd8e81a0b [LLVM] Add flatten function attribute to LLVM IR and implement recursive inlining in AlwaysInliner (#174899) 90d3944c4ad1 [CIR] Implement VisitCXXStdInitializerListExpr for 2 ptr layout (#186679) 7d76a3122dde [SLP]Improve analysis for the shl-based reduced values with copyables (#185485) a52a3f6c7a04 AMDGPU/GlobalISel: RegBankLegalize rules for s_sleep_var, s_prefetch (#187382) 584c83cb1527 [Clang][AMDGPU] Add clang builtins for buffer format load/store intrinsics (#187064) 4f298d4efa2f [lldb] Support arm64e in the expression evaluator (#186001) 05ae66851da5 [lldb][bytecode] Swift output is conditional on >=6.3 (#187544) f334194167b7 [BasicAA] Use KnownBits trailing zeros to boost GCD in modular aliasing check (#187297) e405a1195716 [lldb][debugserver] Get the size of the shared cache in mapped VM (#187419) d96722b66022 [LLVM] Improve IR parsing and printing for target memory locations (#176968) 467cf7caeda8 [SandboxIR] Implement UncondBrInst and CondBrInst (#187196) 7925ef6df83c [clang] fix crash related to missing source locations for converted template arguments (#187352) 1418f80d5c4d [mlir][tensor] Forward concat insert_slice destination into DPS provider (#183490) 300a19d83e29 [X86] bitcnt-big-integer.ll - add vXi128/vXi256 ctpop test coverage for #187447 (#187533) de9746cdfeb4 [clang-doc] Add individual target for unit tests (#185695) 1d9762a07765 [RISCV] Add scheduling models for `sifive-x160` and `sifive-x180` (#187089) 23d8651de3f8 [lldb][bytecode] Remove tracking of stack temps in compiler (#187401) 555caa18762f [clang-tidy] Fix `readability-else-after-return` breaking code by deleting too many characters (#187437) ffa8ba8ce2d1 [NFC][LAA] Minor stylistic/comments improvements (#185510) dfafee7a4782 [SCEV] Convert more interfaces to use SCEVUse (NFC). (#185045) 65f6a346a96a [NFC][analyzer] Eliminate IndirectGotoNodeBuilder (#187343) 3f5d6bdd2ac0 AMDGPU/GlobalISel: RegBankLegalize rules for buffer atomic add/sub (#187405) b20d21aaf164 [C23] Downgrade WG14 N3037 implementation status to partial (#187495) 422dabeb4e82 [RISCV] SFB with Immediates to QC.MVccI (#186555) 55e6683fe4a2 [NFC][AMDGPU] Move SWMMAC features into specific target feature sets (#187394) 9044b0f17162 [DebugInfo][CodeView] Support `S_DEFRANGE_REGISTER_REL_INDIR` (#186410) 4ea9c1a4564a [LICM] Mark load function as willreturn in test (NFC) 1d854bd51bf0 AMDGPU/GlobalISel: RegBankLegalize rules for s_sendmsg (#187361) bdeb18a74e59 [llc] Enable -mattr=help regardless of -mattr order (#187269) 2ec08b31941d [LSV] Added check for mismatched GEP strides in getConstantOffsetComplexAddrs (#186671) 4cbb67a96219 [AMDGPU] Use empty() instead of size() comparisons. NFC. (#187424) d0caa41c51ce [GISel] import pattern `(A-(B-C)) to A+(C-B)` (#181676) 9050794e06cf [SLP]Improve reductions for copyables/split nodes 593683f9a0eb [OpenACC][NFC] Generalize wrapMultiBlockRegionWithSCFExecuteRegion (#187359) 83b378b38196 Unsupported: llvm/test/Transforms/LoopVectorize/Sparc/no-vectorize.ll e2c9dde1a5f3 AMDGPU/GlobalISel: RegBankLegalize rules for s_ttracedata (#187342) c9f6ad8d4299 [libc++][docs][NFC] Update Open XL supported version to 17.1.4 (#176112) a693970f0793 [LICM] Regenerate test checks (NFC) b7776ccebe9d [CIR] Add support for array new with ctor init (#187418) d18a784d4106 [compiler-rt] Define GPU specific handling of profiling functions (#185763) 923cc2d43b41 [AMDGPU] Fix alias handling in module splitting functionality (#187295) d8a83a11231f [NFC][SPIR-V] Disable tests failed after spirv-val update (#187028) d049eef4b5ab [DAG] Use value tracking to detect or_disjoint patterns and add a add_like pattern matcher (#187478) 4199bb1a8149 [AMDGPU] Simplify loop in AMDGPULowerVGPREncoding::handleCoissue. NFC. (#187511) c5c0b8348e6c [mlir][memref] Rewrite scalar `memref.copy` through reinterpret_cast into load/store (#186118) c63ce62f7cf6 [NFC][AMDGPU] New test for untested case in SILowerI1Copies (#186127) 2754e35f7347 [mlir][EmitC] Support pointer-based memrefs in load/store lowering (#186828) 201d3547cce1 [AMDGPU] Clean up `LowerFP_TO_INT_SAT` in AMDGPUTargetLowering (#187486) e1aef9e22748 [libc++] Fix missing availability check for visionOS in apple_availability.h (#187015) 70bb9e24526a [CycleInfo] Index using block numbers instead of pointers (#187500) 5ae5f9df42f7 [DA] Check nsw flags for addrecs in the Exact SIV test (#186387) bc2a8ef6f567 [lldb][NativePDB] Remove cantFail uses (1 out of ?) (#187158) 989ea0e2d726 [MLIR][XeGPU] Lowering 2-Dimensional Reductions of N-D Tensors into Chained 1-D Reductions (#186034) 8ca7a336fb10 [SCEV] Generate test checks (NFC) cf92512e0968 [DebugInfo] Add Verifier check for local imports in CU's imports field (#187118) 807377492e7c [MemorySSA] Fix EXPENSIVE_CHECKS build cdaf29f84dd0 Revert "[LV] Simplify and unify resume value handling for epilogue vec." (#187504) b55f6dbb35f1 [clang][ssaf] Improve layout of `clang-ssaf-format --list` by adding a separator between name and description 153c230446ca [PDB] Fix and simplify module index lookup (#179869) ef4f87425c51 [analyzer] Fix [[clang::suppress]] for friend function templates with namespace-scope forward-declarations (#187043) da92bc06ff47 [mlir][acc] Support call target handling for bind(name) (#187390) 44e306ecdb02 [Clang] Correctly link and handle PGO options on the GPU (#185761) b227fab5a602 [NFC][LV] Introduce enums for uncountable exit detail and style (#184808) bed9fa2de54a [libc][sys/sem] Add sys v sem headers and syscall wrapper implementation (#185914) 0e7262407ca6 [offload] - Remove standalone build in favor of 'runtimes' (#170693) e8556ff6b664 [NFC] Remove fractional part of costs in maxbandwidth-regpressure.ll (#187498) b91c5a7701e1 [AMDGPU] Test saturated f32 to i8 conversion on vectors (#187487) 068176a50371 [Analysis] Remove LLVM_ABI annotations from llvm/lib/Analysis/BranchProbabilityInfo.cpp which cause build errors (#187388) e3415da3cd12 [Flang][OpenMP] Permit THREADPRIVATE variables in EQUIVALENCE statements (#186696) a32d2695c38f [bazel] Gate GPU parsers behind llvm_targets (#187213) a3e3fed088bd [CodeGen] Declare MachineCycleInfo in headers (#187494) 2e2bcf785519 [AMDGPU] Remove unused forward declaration dddf01cc1457 [RISCV] Relax out of range Zibi conditional branches (#186965) 76f725257137 [FastISel] generate FAKE_USE for llvm.fake.use (#187116) d641186cb61e [clang-cl] test that `-Xlinker` works, update supported options docs (#187395) 18ed1a9414b2 [X86] Add bitrevese/bswap i128/i256/i512 test coverage for #187353 (#187492) 78a8f0097796 Revert "[VPlan] Create header phis once regions have been created (NFC)." 289c58823150 [X86] Optimize load-trunc-store for v4i16/v2i32/v2i16 vectors (#186676) 1078a1dabd68 Lowering `~x | (x - 1)` to `~blsi(x)` (#186722) 49a5192e5d70 [CycleInfo] Don't store top-level cycle per block (#187488) 7d02ca610b9c [mlir][LLVM] add llvm.fake.use to LLVM dialect (#187026) 796b218edd35 [LegalizeTypes] Expand UDIV/UREM by constant via chunk summation (#146238) 582fa7875374 [SLP]Do not match buildvector node, if current node is part of its combined nodes a0b5025752ea Revert "[SLP] Loop aware cost model/tree building" 191c84b822e0 [VPlan] Permit derived IV in isHeaderMask (#187360) 6aeeae676ac4 [SPARC][Tests] Add lit.local.cfg to SPARC LoopVectorize tests (#187489) 2babd1709e69 merge main into amd-staging b029b9879749 [X86] Add i128 bit manipulation pattern test coverage (#187480) 23af867e6d93 [SPARC] Add TTI implementation for getting register numbers and widths (#180660) c3e7624ac4bd [clang] Add implicit std::align_val_t to std namespace DeclContext for module merging (#187347) f104b7355ce8 [NFC][SPIRV] Run `spirv-val` on tests related to `SPV_ALTERA_arbitrary_precision_integers` (#187464) 76638021257f [LLVM][DAGCombiner] Limit extract_subvec(extract_subvec()) combine to vectors of the same type. (#187334) c58d62857e4f [STLForwardCompat] Switch transformOptional from direct call to invoke (#186333) d29c6a34255e [TabelGen] Use ID{n-m} for outer let statements (#187436) eaf04be3417c [SPIR-V] Complete SPV_INTEL_16bit_atomics extension support (#184312) e122a2d53193 [flang][OpenMP] Remove extraneous semicolon, NFC (#187468) 0d05c882ce99 [Support] Use block numbers for LoopInfo BBMap (#103400) 333ac33be6f4 [Analysis][NFC] Include LoopInfoImpl only in source file (#187459) 4262045ba98d [DebugInfo] Fix segfault in constructSubprogramScopeDIE with null subprogram type (#184299) 621d40e1827e Revert "[clang-tidy] [Modules] Skip checking decls in clang-tidy" (#187461) 2bb6b5902877 [libsycl] add USM alloc/free functions (#184111) d518f8ff6740 [MemorySSA] Fix handling of cross-iteration dependencies for calls (#187291) 1f8da277148f libclc: Really implement half trig functions (#187457) 1ba5b6e875d1 libclc: Stop implementing sincos as separate sin and cos (#187456) 7fad49b186c8 [AArch64][SVE] Prefer FMOV for scalar insert into first element of zero. (#187236) 6e8ca5edde05 libclc: Fix nextafter with -cl-denorms-are-zero (#187358) 85e9ac589819 libclc: Add canonicalize utility functions (#187357) 9b7c437033b2 libclc: Update f64 trig functions (#187455) 0960f0b8feb3 libclc: Really implement denormal config checks (#187356) c800afd65e12 [clang-tidy] [Modules] Skip checking decls in clang-tidy (#145630) a54c1490611c libclc: Invert subnormal checks (#187355) 07ff1a63dac5 [CompilationDatabase] Treat .cppm file as C++ in InterpolatingCompilationDatabase (#187446) ae66911399b7 [lldb][Platform][NFC] Move SanitizedScriptingModuleName into ScriptInterpreter (#187229) ca5e4bcc5715 [NFC] [clangd] [Modules] Leave more log for failing cases (#187448) bdfd9725afd1 libclc: Move subnormal config file to clc (#187354) 1c3c349cf8a3 [clang-format] Fix stale .lock files in git-clang-format (#187379) 3e90e1a26fd1 [docs][QualGroup] Update Qualification WG sync-up schedule and calendar links (#186011) e3198dbe59ab libclc: Move FLT_MIN gentype macros (#187272) 9e6ce65962c4 libclc: Fix vector float tan (#187387) 79042e701b7b [clang-format] Add LeaveAll to the BreakAfterAttributes option (#187204) f554fcfd0b75 Revert "[flang][openacc][cuda] Add implicit device attribute for use_device unconditionally" (#187438) 55607559203c [flang][OpenACC] Fix crash on invalid clauses in WAIT and ATOMIC constructs (#187263) 4db2ce4d546f [libc][math] Refactor dadd family to header-only (#182142) 2e6740541ba9 [clang][headers][endian.h] include_next in freestanding (#187380) 8c1896d9067b [clang-tidy][NFC] Compare nodes by pointer instead of by ID in `readability-else-after-return` (#187363) e9bf455ae9a5 [SLP] Loop aware cost model/tree building 160ac07cbd4a [SLP]Add external uses estimations into tree throttling 7335ebf5f8ca merge main into amd-staging (#1809) f1c8b9b4aad9 [Clang] Fix assertion when __block is used on global variables in C mode (#183988) 291359be687e [SelectionDAG] Move the call to BuildExactSDIV and BuildExactUDIV to the top of BuildSDIV/BuildUDIV. (#187378) 310766bb389f merge main into amd-staging (#1808) 3fdec1c9f946 [clang] Enable exceptions in CWG2486 test (#187195) 015e3d2b2092 [compiler-rt] Unify python shebangs (#187285) d434d82010be [MLGO] Modernize type annotations in mlgo-utils (#187408) 39b6a4d84ad8 [HLSL] Add globals for resources embedded in structs (#184281) 8176bc0e9b53 [HLSL][SPIRV] Use 0 to represent unbounded arrays on shader flags (#187174) 77066a3d33db [SandboxVec][SeedCollection] Aux pass argument for enabling different types (#155079) 80034dd582d3 [libc][annex_k] Add rsize_t (#163238) 53f8f3b01794 Reland [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068) (#187199) 42b75ed85fcb [libc][math] Refactor bf16divf128 to Header Only (#186641) 19fbdf9a1ecd merge main into amd-staging fc2c965394b4 [Passes] Remove redundant semicolon from PassRegistry.def fb36a54ef6bc [lldb] Rename formatv verbose log call, misc log cleanups [NFC] (#186951) fb39a5d6afe1 [flang] Better handling of ALLOCATED(pointer) error (#186622) 0d01afffe123 [Utils] Format git-llvm-push 3f36e7030f30 [AsmPrinter] Only warn about unsupported remarks section if requested (#187362) 2c0d210d2c27 merge main into amd-staging f4199fa99626 [Utils] Add --use-gh-cli-token flag 3cf80812f007 [llvm-remarkutil] filter: Add --sort and --dedupe flags (#187338) 3d881804bbd3 [Clang][OpenMP] Move declare simd codegen into OMPIRBuilder (#186030) 319d3c056b9e [mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir (#182223) a67c3b746836 [NewPM] Adds a port for AArch64ExpandPseudo (#187332) ca07ca0314bf [flang] Fix confusing explanatory error message (#187341) 8e062105593b [libc][annex_k] Add errno_t (#163094) abdcde9bbc9b [SLP] Loop aware cost model/tree building 89657f726f44 [libc][docs][NFC] Documentation consolidation and de-duplication (#187385) 04e86fbff043 [llvm-jitlink] [test] Add an XFAIL for a JITLink test on MinGW (#186980) a70f82fab606 [clang][FlowSensitive] Do a quick check and bail early for massive CFGs (#186808) 2a89e249a293 [flang] [flang-rt] Subscript overrun could occur in namelists during a READ command. (#176959) d70ebc84acd6 [lldb][bytecode] Compile pick ops using unsigned literal (#187376) 227ced1b1eb4 [flang] Use integer arith.max/min operations for max/min lowering. (#186466) 752ccf718b71 [flang][openacc][cuda] Add implicit device attribute for use_device unconditionally (#186844) e044c4ad81f0 [AMDGPU] Add target features for SWMMAC instructions (#185785) 3de7814b8d2e [MLIR][XeVM] Update HandleVectorExtractPattern (#186247) a9181e8f9d3f [CIR] Fix CFG flattening for loops with cleanup in special regions (#187369) c95af4078379 [MLIR][XeVM] Add truncf and mma_mx op. (#180055) 0b49adc32c8d [AMDGPU] Rename AMDGPUMachineFunction to AMDGPUMachineFunctionInfo. NFC. (#187276) fce100e26e7e [VPlan] Fix masked_cond expansion. d1e625c93f85 [clang-tidy] `bugprone-unchecked-optional-access`: Add support for GTest asserts like `ASSERT_TRUE` and `ASSERT_FALSE` (#186363) 5a5c3176ef24 [MLIR][Python] Add optional emit reset to exportSMTLIB (#187366) 360fab623d3f [RISCV] Fix IDiv/IRem scheduling data for RV32 cores that use the SiFive7 model (#187331) 2be4a9b1b208 [LV] Add predicated early-exit tests showing poison prop issue. (NFC) d226f1b16136 [AMDGPU] Regenerate codegen tests to check extra stuff at end of line (#187325) 96299d8d4d3c [flang] Disable trampoline test for PPC (NFC) (#187194) d4b86e561752 [LSR] skip ephemeral IV users when collecting IV chains (#187282) c630b09af7dc [CIR][NFC] Remove NYI checks in ternary with cleanup (#186870) 03f488a00255 [AsmPrinter][MTE] Support memtag-globals for all AArch64 targets (#187065) da47ede4b272 [AArch64] Fix register scavenger crash when merging MTE stack tags (#186934) eabcfcee0803 [HLSL][DXIL][SPIRV] QuadReadAcrossX intrinsic support (#184360) 17158b2ab16d [InstCombine] Fix comment in SimplifyDemandedUseBits (NFC) (#187126) e9799e51ed32 [lldb-dap] Improve support for variables with anonymous fields and types (#186482) 0ea2e5813f96 [VPlan] Account for early-exit dispatch blocks when updating LI. (#185618) b2c2422c2ea9 [CIR] Upstream ThreeWayCmpOp (#169963) 81950f6de421 [mlir][GPU] Bump static bound on cluster IDs (#187106) 480eba33e294 [lldb][PrefixMap] follow up fixes to #187145 (#187337) 872247c702ff [NVPTX] Split Param address space into EntryParam and DeviceParam (NFC) (#186636) befaa35212db [libc++] Fix passing through object to comparisons in __tree (#186341) 3e09538a4268 [libc++] Expand test coverage for converting comparators in associative containers (#187133) a33e9e5047c9 Move the call frame edges log messages to the verbose channel. (#187324) a2891ff85c6c Reapply "[LoopUnroll] Remove computeUnrollCount()'s return value" (#187104) f8db5db9586c [flang] Fix fir.call setCalleeFromCallable (#187124) ba231aaaa0d1 [clang-doc] Enclose documented entities in a card (#185121) d54da6897375 [llvm-remarkutil] filter: Add --exclude flag (#187163) 716646895747 [flang][acc] Handle deduplicated use_device (part 2) (#187305) b11a603a45e1 [mlir][Transform] Fix crash in SequenceOp::getEffects when body region is empty (#185063) d8545a486857 [LoopFusion] Use DA by default for dependence analysis (#187309) 253616de7e8f [libc][docs] Generate configure.rst in the build directory (#187266) 81d3f04f2990 [NFC] Fix mve-reg-pressure-spills.ll test (#187316) 6e86ee2c23d7 [clang][modules] Stop uniquing implicit modules via `FileEntry` (#185765) 4e500bd00150 Revert "[SLP] Loop aware cost model/tree building" 2caba086abeb [ASan] Fix overflow and last byte handling in __asan_region_is_poisoned (#183900) b17db271d030 [clang][Headers] add endian.h (#186032) 9dd2e3792a9f [DAGCombiner] Move the XORHandle in rebuildSetCC inside the while loop. (#187189) 9b0c2a135e07 [NFC] Update `LoopVectorize/predicator.ll` test (#187125) 7a9299f7f11a [RISCV] Rename add_like pattern -> riscv_add_like (#187306) 16585af33b4d [mlir][acc] Fix bindNameValue for RoutineOp (#187307) 55cee50e6b16 [AMDGPU] Use native instructions for f16 to u16/i16 saturated conversion (#186769) 13a093b2b250 [VPlan] Compute cost for predicated loads/stores to invariant address. (#181572) 1b8db068ed27 [PrefixMap] Teach lldb to auto-load compilation-prefix-map.json (#187145) 9418cdbccab9 [llvm-remarkutil] filter: Support multiple input files (#187162) 63b44decb547 [clang][bytecode] Allocate local variables in `InterpFrame` tail storage (#185835) c9c057564475 [Nightly][Infra] Enable the target gfx950 (#1796) f60934412000 AMDGPU/GlobalISel: RegBankLegalize rules for ds_add/sub_gs_reg_rtn (#185991) 0f5d8a960f7c [Bazel] Fixes dd9dd1d (#187310) 4c745df8bc90 [MLIR][LLVMIR][NFC] Drop uses of BranchInst (#187304) bfedc2aa7668 [Polly][NFC] Drop uses of BranchInst (#187301) dd9dd1d2f39c [mlir][bufferization] Fix crash in promote-buffers-to-stack for nested memrefs (#186426) 96ec23096ce7 [tools][examples][NFC] Drop remaining uses of BranchInst (#187293) 60c102036acf [AArch64][AsmParser] Add MC support for %dtprel() relocation (#186599) b37f8a54a0b1 [flang] Fix extra "./" prefix in source file paths (#186212) a083e19efeb1 [VPlan] Add the cost of spills when considering register pressure (#179646) b5ef9e29c5fe [MLIR][XeGPU] Avoid crashing on `gpu.func` missing `gpu.return` (#186330) dc0ba9b4531f [flang][acc] Add missing dependency for checking CUF attributes (#187292) 9871ad1c8a54 [VPlan] Rename DataLayout -> DL 88f8a2bdb25b [flang][NFC] Converted five tests from old lowering to new lowering (part 34) (#187175) fe09f74e1cf6 merge main into amd-staging (#1797) 080bc2572896 [IR][NFCI] Remove *WithoutDebug (#187240) 9f4a1ebb2054 [flang] Add const-qualified version of parser::Messages::messages() (#187250) 3a3f863e34ae [X86] sse-minmax.ll - add baseline SSE2 test coverage (#187283) 5e3202749ac1 [MLIR] Fix crash in FrozenRewritePatternSet when PDL lowering is skipped by debug counter (#186159) bf46a95f2ca2 [VPlan] Use target's index type for {First,Last}ActiveLane instead of i64 (#186361) fc569dafd767 [libc++] Refactor __is_transparent_v to make it clear what it depends on (#186419) a4ef581c714b [InstCombine] RAUW for proven zero-indexed GEPs rather than cloning for a specific user (#185053) 7a2193cd190b [Offload] Add CMake alias for CI (#186099) b85cf95aad71 [mlir][acc] Move acc routine functions into GPU module (#187161) 2ef6a669afe9 [NFC][AArch64] fix triple used in test (#187275) 6261cb4487f1 [SLP] Loop aware cost model/tree building e1c81fa24eeb [gn build] Port e4a2d9cd8a63 c051449e27df [gn build] Port d0d0a665c238 e9b95ce4a728 [gn build] Port 6b3cf50d958c 6ef0b80d9d8e [gn build] Port 681f1a5ee987 0f6cd5c1838c [gn build] Port 55db533b74fe c94403dbd336 [gn build] Port 45fe4bbdde13 d783723a584a [compiler-rt] Update runtime build script to detect RPC XDR header for AIX (#186977) dfc02b7744c5 [gn] port c1f6fd24aa637d6a 77667d7c5bd5 [flang] Fix the CHECK: directive to ensure flagging RWE (NFC) (#187186) c0064f744c01 [libc][math] Fix missing underflow exception in DyadicFloat::generic_as (#186734) fa8d3c810f36 [NewPM] Add port for AArch64DeadRegisterDefinitionsPass (#187180) 073d019c450d [OpenMP] Use the standard fences now that they are supported (#187138) 1db809655578 [libc][math] Improve hypotf performance. (#186627) 2a8168ddaf4d [CIR] Add support for arrays-of-pointer-to-member-data (#186887) 1b904e948afb [CIR][NFC] Unify the 'null data member attr' getters (#186876) 138cae4a08d2 [CIR][NFC] Split the CXXABI 'TypeConverter' into its own type. (#186874) de3f57399c66 [CIR] Fix bug where block after-unreachable wasn't CXXABILowered (#186869) dafadf53a0a7 Fix MSVC "not all control paths return a value" warning. NFC. (#187265) b15fa374fff9 libclc: Improve float trig function handling (#187264) d7dbf1bd641f [mlir][gpu] Fix typo in documentation (#156619) 9b8532dd2aaa libclc: Clean up sincos macro usage (#187260) 2ecd001215f5 libclc: Use select function instead of ?: for some fp selects (#187253) 2f44f6998345 [flang][OpenMP] Use OmpDirectiveSpecification for range/depth queries, NFC (#187109) a9605a92bdb8 [clangd] Support suppressions for driver diagnostics (#182912) fd47fbe87e7f [lldb] Do not use mkdir -p in makefile on Windows (#187244) cf89c33e6deb [OpenMP] Map const-qualified target map variables as 'to'. (#185918) c949c9be610d [AArch64][llvm] Make SBZ/SBO insns warn not fail when disassembling (#187068) 596269288899 merge main into amd-staging 2563006f3178 [Clang][NFC] Drop uses of BranchInst (#187242) f1b82dcd99bc [Bazel] Fixes c1f6fd2 (#187146) f52b2616f4d8 [mlir][vector] Use non-native runner in gather.mlir test (#187243) 3f649d0537b3 [AArch64] Use SVE/NEON FMLAL top/bottom instructions (#186798) 60dc4c70fc44 [CycleInfo] Use block numbers for dfs numbering (NFC) (#187062) 2915519efd91 [orc-rt] Move CallViaSession into Session, add comments. (#187238) 003ec3e0a161 [NFC][AArch64] add tests for `is_fpclass` (#187231) 49f9b4b44a66 [LV] Add test for diff checks with ptrtoint subtract. (NFC) c374678d2955 [orc-rt] Rename Session setController/detachFromController. NFC. (#187235) 671ccfea2767 [mlir][reducer] Add eraseAllOpsInRegion function to reduction-tree pass (#185892) 95824ca6b92b [Frontend/OpenMP][NFC] Drop uses of BranchInst (#186393) 6bfb44f32016 [orc-rt] Add ShutdownRequested flag to Service::onDetach. (#187230) 7404a5dbe0ca [PowerPC] Preserve load output chain in vcmpequb combine (#187010) 2734c46153cf [DAG] Add back SelectionDAG::dump() without parameter (#187001) f9d2d8beeff0 [clang] Enable '-verify-directives' mode in C++ DR tests (#187219) fef74e1c005d [mlir][spirv] add ExecutionModeIdOp (#186241) 8c4f4e8a05ab [LifetimeSafety] Track origins through array subscript and array-to-pointer decay (#186902) 4f6379069eb2 [mlir][tosa][tosa-to-linalg] Fix rescale with double rounding failing validation (#184787) 23a0c9f55826 [lldb] Skip file cleanup to avoid permission issue in API test (#187227) f7763570e558 [VPlan] Improve code in VPlanRecipes using VPlanPatternMatch (NFC) (#187130) 30c962c9b701 [Instrumentation][nsan] Add maximumnum to NSAN (#186345) 25abe22ed852 [X86] Improve handling of i512 SRA(MSB,Amt) "highbits" mask creation (#187141) 9cb9081049a4 [mlir][vector] Extend vector.gather e2e test (#187071) 570c388685e2 [llvm][utils] Give git-llvm-push u+x permissions (#187211) a8ff7e13c3be [NFCI] [Serialization] Deduplicate DeclID properly (#187212) e762078424b2 [VPlan] Use auto return in VPlanPatternMatch (NFC) (#187210) b83fd4dc5937 [AArch64][GlobalISel] Fix uqadd/sub with scalar operands (#186999) 81ba8b2aa0e3 merge main into amd-staging (#1795) 9b2fe0c885d1 [X86] Remove extranous I in comment. NFC (#187209) ea8fb06f2443 [atomicrmw] fminimumnum/fmaximumnum support (#187030) fdd2437af3cd [lldb] Avoid permission issue in API test with SHARED_BUILD_TESTCASE (#187072) ec1c08a29145 [DA] Regenerate assertions for the tests (NFC) (#187207) b3fdcac90d9d [AArch64] Remove vector REV16, use BSWAP instead (#186414) 77ad2c2a9cfd [DA] Add test that represents an edge case for the Exact SIV test (NFC) (#186389) 0f622c507ecc [orc-rt] Add TaskGroup for tracking completion of a set of tasks. (#187205) 76d5704633c7 [NFC][PowerPC] Update check lines to include power 9 label (#187193) fbd24677963a [AMDGPU] DPP implementations for Wave Reduction (#185814) 7899b26e88f5 [lldb-dap] Allow expressions in setVariable request (#185611) 63dd9966d319 [orc-rt] Capture a Session& in SimpleNativeMemoryMap, fix TODOs. (#187200) 351501799ab4 [CodeGen] Improve `getLoadExtAction` and friends (#181104) 9a2f23e1a40b [CodeGen] Use separate MBB number for analyses (#187086) 14b42335c641 [orc-rt] Publish controller interface from SimpleNativeMemoryMap ctor. (#187198) 51fd033521b6 [BOLT] Enable compatibility of instrumentation-file-append-pid with instrumentation-sleep-time (#183919) 950292535e8d [orc-rt] De-duplicate some test helper APIs. (#187187) dc8fd02b6237 [clang] Reshuffle compiler options in C++ DR tests ee0ac7443e4d [mlir][x86] Lower packed type vector.contract to AMX dot-product (#182810) 2890f9883cb9 [OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927) 038c8d3f4f23 [DA] Rewrite formula in the Weak Zero SIV tests (#183738) 5f6cd9b92324 [DA] Fix overflow in symbolic RDIV test (#185805) 709ef15d7431 [NFC][PowerPC] Pre-commit to optimize bswap64 builtin for power8 (#181776) 3a7568311010 [libclang/python] Add type annotations to the TranslationUnit class (#180876) 60b11a479239 merge main into amd-staging 70665c665a08 [clang] Update C++ DR status page 8187875b5802 [clang][Driver][Darwin] Use `xcselect` for `*-apple-darwin*` targets too (#186683) e0f74e65a4df merge main into amd-staging (#1792) ffcb5745ab54 [orc-rt] Add BootstrapInfo: info for controller session bootstrap. (#187184) c61d11df4003 [clang][RISCV] Add RequiredFeatures for zvknha and zvknhb (#186993) 3a1d5b5b8ce5 [X86] Support reserving EDI on x86-32 (#186123) 495c518b96cb [FMV][AIX] Implement target_clones (cpu-only) (#177428) 3661bf74cdc5 [Clang][Modules] Add regression test for #179178 (#187173) aa2defc147c3 [X86][APX] Remove patterns for ArithBinOp (#187018) c4137a6c0f63 [orc-rt] Relax addUnique assertion to match error condition. 13665f0d8930 [AMDGPU] Set gfx1250 default to B0 350385e7923e [libclc][NFC] Change include style from <...> to "..." (#186537) ae9b5a4bcad8 [clang] Add `-verify-directives` cc1 flag (#179835) dbdf1accf55d [orc-rt] Rename SimpleSymbolTable::addSymbolsUnique, relax error cond… (#187171) 1fee51c40b47 [WebAssembly] Fold sign-extending shifts into signed loads in FastISel (#185906) 8b265cf27025 [NVPTX][AutoUpgrade] atom.load intrinsics should be autoupgraded to monotonic atomicrmw for NVPTX (#187140) 3eb8b788b7df Revert "[LV] Replace remaining LogicalAnd to vp.merge in EVL optimization." (#187170) 9b5db0468438 Fix build issue starts_with c727cd9a4b21 [orc-rt] Rename ControllerInterface to SimpleSymbolTable. NFCI. (#187164) 52089f895eb5 [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068) a0656aba831a merge main into amd-staging 20c0e362b856 [AMDGPU] Change A0/B0 commentary to only affect gfx1250 (#1790) c493feb71106 [Clang][Driver] Deprecate -parallel-jobs= in favor of --offload-jobs= d49701bba36f [CIR] Implement abstract conditional operator handling for aggregates (#186284) f6f57f0da6bf [CIR] Add handling for nounwind attributes (#187096) d9eba8b3550d [lldb][Module] Don't try to locate scripting resources without a ScriptInterpreter (#187091) 3482480087ed [orc-rt] Add an ExecutorProcessInfo field to Session. (#187155) 1337dc9a5e1a [clang][OpenMP] Remove -fopenmp-allow-kernel-io flag e7326007960b [orc-rt] Require non-empty triples and power-of-two page sizes. (#187151) f2605193689f [IR2Vec] Remove redundant death test for invalid TypeID (#187143) c78c5df960fb [mlir][spirv] Add comparison and elementwise ternary ops in TOSA Ext Inst Set (#186356) ce805223940b [orc-rt] Add ExecutorProcessInfo APIs. (#187147) 1a359488d2be merge main into amd-staging (#1787) c1f6fd24aa63 Reapply "[clang][ssaf] Add UnsafeBufferUsage summary extractor for functions (#182941)" (#186899) 838b1ccdd90b [libc] Add a smaller b36_char_to_int (#180841) d17ce9a6fbe4 [ROCDL] Align mfma op description examples with the actual op (#186949) 5bda6166684f [AMDGPU][NFC] Remove kernarg_segment_ptr regbankselect test (#186029) d13eb6d16e18 AMDGPU/GlobalISel: RegBankLegalize rules for s_setreg (#186872) 8f891a1bb3d7 [dsymutil] Fall back to compatible triple in BinaryHolder (#186893) 2fd9ee05c6be merge amd-main into amd-staging (#1785) b03b58be38f9 [NVPTX] Fix scoped atomic when given runtime values (#185883) cbedaa83e182 [lldb] Upstream arm64e support in ValueObject (#186906) 07542af9215c [flang][OpenMP][CUDA] Place privatized device allocatable descriptors in managed memory (#187114) 5706070eb809 [AMDGPU][GlobalISel] Switch tests to new reg-bank-select and refresh checks (#186506) e7e45cdcec91 [spirv][mlir] Add myself to CODEOWNERS (#187115) dd4f5c69c539 [CIR] Fix missing RegionBranchTerminatorOpInterface declarations (#187112) 99b93b586c5d [AMDGPU] fold a call to implictarg.ptr to a poison with no-implicitarg-ptr (#186925) abb7288c1e2a AMDGPU/GlobalISel: RegBankLegalize rules for bswap, cvt_ubyte, rcp (#187093) 29f6bdb65b72 AMDGPU/GlobalISel: RegBankLegalize rules for wave_reduce_umax/umin (#186528) 015601bf04ce [clang][Driver][SPIRV] Fix assertion when using -emit-llvm (#186824) 5293760dd110 [mlir][llvmir] Fix crash when a CallSiteLoc has a UnknownLoc callee (#186860) 673002f32576 [libc][math] Fix bazel build for fmaf16 (#187111) 31fd58d8c91a [PSDB][Infra] Temporary test removal for gfx94x due to MI325 machine shortage 6b2e347ae12b [libc]: implement 'iswpunct' entrypoint (#186968) d4afb1bfed53 [flang][OpenMP] Remove unused function declaration, NFC (#187101) f0e699a35c8b [libc][math] Fix fma bazel build (#187107) 2ef41cca149c [clang-format] Fix Macros configuration not working with try/catch expansions (#184891) af67e30a6303 [SLP][NFC] Refactor BinOpSameOpcodeHelper BIT enum (#187067) bed5e7dc2018 [libc][math] Refactor fmaf16 implementation to header-only in src/__support/math folder. (#163977) e6f0ec83438b [libc][math] Refactor fmaf implementation to header-only in src/__support/math folder. (#163970) f5d83fb40400 [mlir][GPU] Set nsw/nuw when expanding out subgroup ID (#187099) d0d1f0b7af32 [libc][math] Refactor fma implementation to header-only in src/__support/math folder. (#163968) 3d0e7e04c80b [z/OS] Recognize EBCDIC archive magic (#186854) 996b62231ca8 [bazel] NFC: reformat mlir & libc bazel files (#187094) aef7e5768e78 [DirectX] Fix assertion in PointerTypeAnalysis with empty global_ctors (#179034) 385aeb24bca8 Revert "[LoopUnroll] Remove `computeUnrollCount()`'s return value " (#187035) 803828f4aa62 [mlir][GPU] Refactor, improve constant size information handling (#186907) da86e036abca [Bazel] Fixes ebb3309 (#187090) 7477045d3d83 [lldb] Fix build on Linux when SEGV_PKUERR is undefined (#186963) f0dfa36815dc [mlir][spirv] Add spirv.GroupNonUniformBroadcastFirst Op (#185818) 18c8b8d81da2 [Clang] Add __ob_trap support for implicit integer sign change (#185772) ebb3309975c8 [libc] Refactor core Linux syscalls to use syscall_wrappers (#185983) 4b9693a4231b Revert "[libc] Avoid host header collisions in full builds (-nostdinc)" (#187079) aa14eb8421f5 revert fc648683cd75 - [SLP]Add external uses estimations into tree throttling c3219f5222a4 [mlir][spirv] Fix struct.mlir for stricter spirv-val variable-pointer rules (#186974) 037c2095e6bd Add hybrid function ordering support (#186003) 0769dde7a23b Removed Hardcoded SM Number from Mlir Test (#186917) b5614bc21cb5 [green-dragon] fix Python and Swig flags (#187052) 74a5efa3318e [CIR] Split BinOpOverflowOp into separate overflow-checked ops (#186653) f28ef689961e [X86] getMaskNode - perform pre-truncation of oversized scalar mask sources (#187063) 2531b1692cd2 [mlir][bytecode] Use getChecked<T>() in bytecode reading to avoid crashes (#186145) 19fd930bf00c [flang][OpenMP][CUDA] Set allocator_idx on privatized allocatable device array descriptors (#186945) b686f5e62e09 [SandboxVec][BottomUpVec] Fix crash caused by Cmps with different operand types (#186550) 9d94bdace1b1 [mlir][Interfaces][NFC] Add early exit to MakeRegionBranchOpSuccessorInputsDead (#186325) 333f6abe30ed Reland Support float8_e3m4 and float8_e4m3 in np_to_memref (#186453) (#186833) 1800651c86c0 [flang] Lower anint with math.round (#186039) 3c391665ec1f [lldb] Fix user-after-free in CommandInterpreter (#187032) 9c7e203be3e5 [flang] Fix ignore_tkr(c) passing descriptor instead of base address for non-descriptor dummies (#186894) 79d1a2c41856 [AMDGPU] Standardize on using AMDGPU::getNullPointerValue. NFC. (#187037) adf458cbac43 [lldb] Add additional logging to wait_for_file_on_target (#186915) 8b7c0c42edfe merge main into amd-staging 0af9058e6881 merge amd-main into amd-staging 810ba55de915 [CycleInfo] Support forward declarations (#187029) 832c95948c80 [NewPM] Port for AArch64ConditionOptimizer (#186941) b04b9e58aa4c [gn] port 55b271ddc1fd968 6f68daa42cab [InstCombine] Recognize non-negative subtraction patterns (#182597) 19c04ce0d58f [X86] Fix fcmp+select to min/max lowering (#185594) cb3e9eec5fe0 [clang] DeducedTypes deduction kind fix and improvement (#186727) fec11e3e5f26 [libc++] Add scripts defining two LNT runners for libc++ (#187050) abd5b6964e74 [X86] Fold compress(splat(x),splat(x),mask) -> splat(x) (#187042) bed77a1d9bf4 [libc] Avoid host header collisions in full builds (-nostdinc) (#187025) 60f478a1599f Add Zstandard to Windows release build (#186772) e8a03bb043ea [CodGen] Port UnpackMachineBundles to new pass manager (#184918) 25981794438b [AMDGPU][GlobalISel][NFC] Group RegBankLegalize intrinsic rules (#186912) fcefee017b8a AMDGPU: llvm.amdgcn.ds.add/sub.gs.reg.rtn are sources of divergence (#186883) a93560d13eab [lldb][PlatformDarwin][test] Move Platform test utilities into common header for re-use (#187036) 6d107523b14b [analyzer] Fix [[clang::suppress]] for nested templates (#183727) c91a9b8d1048 [libc] Add Jeff Bailey to Maintainers.rst (#186662) 65bf05a494c9 [Instrumentation][nsan] Intrinsic tests + bugfixes (#186803) d20315f15432 [RISCV] Select (sext_inreg (sra X, C), i8/i16) as slli+srai. (#186956) 5d97341c1329 merge main into amd-staging (#1778) 3ee7caa27387 [flang][OpenMP] Use the LoopSequence-based checks (#185300) ca15db1cd509 [lldb] Fix permission issue in API test on lldb-x86_64-win (#187021) db4f8f7af09b [SPIR-V] Add support for arbitrary precision integer constants in instruction printer (#185306) a2e21b67f2f3 device-libs: Replace nextafter implementations (#1727) 3eec4d5b683a device-libs: Remove correctly_rounded_sqrt control libraries (#1724) 52e9e828985f [NFC][AArch64] ConditionOptimizer: refine cmp/cond instruction update code (#186724) 3ae428ff3a52 [libc++][NFC] Rename the template parameter of __make_transparent (#186435) 43ec60eee5f9 Reland "[DomTree] Assert non-null block for pre-dom tree" (#187005) 0f1ec17f29f1 [AMDGPU][GlobalISel] Add RegBankLegalize rules for atomic fmin/fmax (#182824) 6e17b2ef33b8 [CIR][AArch64] Upstream NEON shift left builtins (#186406) 240bc0a7ad74 [AMDGPU] Remove R600TargetTransformInfo dependency on AMDGPUTargetLowering. NFC. (#187014) 05f2b89f4459 [NFC][analyzer] Update some incorrect doc-comments (#186852) a114bbe4cb7d [ValueTracking] fadd never produces subnormal with no underflow (#186985) 2859621ddbb3 [Bazel] Port 429e9717 (#187019) a1a714b8b87e [MLIR][Interfaces] Make `getMutableSuccessorOperands` overridable on `ReturnLike` ops (#186832) 9f4fbe86a592 [lldb] Add pointer arithmetics for addition and subtraction to DIL (#184652) dc5c6d008f48 [sancov] add -fsanitize-coverage=trace-pc-entry-exit (#185972) 0eefb2682bf8 [libc++] Build the library with C++26 (#181021) e31db655fd51 [NFC][analyzer] Improve computeObjectUnderConstruction (#186186) bec0f40ef2ee [SPIR-V] Handle spirv.MemoryModel metadata (#186138) f335bd9685b3 [Flang][OpenMP] Add semantic support for OpenMP Loop Interchange and permutation clause in Flang (#183435) 818efd5c9541 [SPIR-V] Handle undef aggregate initializers for global variables (#186785) 34fa16afff1d [LifetimeSafety] Exclude basic_string::insert from capturing methods (#186989) 9a42e5ba6fd8 [mlir][tosa] Remove 'Pure' trait from operations that are not speculatable (#185700) 2b5e30262777 [lldb][windows] fix TestReplaceDLL.py reruns (#187002) e1baf3a99bdc [AMDGPU] Remove AMDGPUCallLowering dependency on AMDGPUTargetLowering. NFC. (#187008) 3be7b2fc9da7 [X86] Improve handling of i512 SHL(-1,Amt) + SRL(-1,Amt) "mask shifts" (#186806) 5de7c865dc82 [X86][APX] Enable NDD tunings (#186049) 6c9407a2f064 [Bazel] Port 9e43b35 (#187011) bc54aeff7445 [LAA] Add tests with missed aliasing invariant load/store. (NFC) fdbc015abc9f [lldb][PlatformDarwin][NFC] Move logic to emit warning on invalid/conflicting Python script names into helper function (#185669) bc190619eb41 [Bazel] Port 55b271d (#187007) a78d1d9a8b0c [mlir][vector] Add missing tests (nfc) (#186990) 04cc7523ed6a [mlir][bufferization] Fix crash with copy-before-write + bufferize-function-boundaries (#186446) a26077ee5f4b [NFC][NVPTX] Fix tcgen05.mma PTX instruction encoding (#186602) 22840d33d768 merge main into amd-staging 055322c38af1 [mlir] Fix crash in diagnostic verifier for unmatched @unknown expectations (#186148) b861a289d799 [WebAssembly] combine `bitmask` with `setcc <X>, 0, setlt` (#179065) 67e47fb5317c [mlir][gpu] Add SymbolUserOpInterface to launch_func op (#173277) f1a7c7e772f9 [MIR] Support symbolic INLINEASM extra-info flags (#186818) 962b304f6130 [LLVM] Make -use-constant-fp-for-scalable-splat the default. (#186422) 429e9717e232 [mlir][arith][NFC] Use type parser instead of hard-coding type keywords (#186753) c69187622fdf [orc-rt] Update SPS wrapper names to reflect new namespace. NFCI. (#186994) 612d80348f79 [orc-rt] Move SPS controller interface funcs into their own headers. (#186991) e904d559c5ae [mlir][bytecode] Fix crashes when reading bytecode with unsupported types (#186354) 35118457abeb [flang][NFC] Converted five tests from old lowering to new lowering (part 33) (#186943) f15852ce21ef [AArch64] Remove promotion cost for fixed-length bfloat arith with +sve-b16b16 (#186378) 740f1b56c925 [lldb][PlatformDarwin] Reword warning when locating scripting resources from dSYM (#185666) b14eea0b2313 [libclc] Fix check-libclc dependency on llvm-dis (#186978) df03e1a3724a [MIR] Fix printing INLINEASM dialects. (#186797) 9b5084f894cb [clang][win] Define vector deleting dtor body for declared-only dtor if needed (#185653) fdcb1f4ab19f [Clang] Make members with exclude_from_explicit_instantiation never be exported or imported (#185140) 6f966fb5dade [LV] Add select instruction to VPReplicateRecipe::computeCost (#186825) 527496bb10f5 libclc: Improve large float trig reduction (#186984) 107b113b67d2 libclc: Use small trig reduction for nan (#186983) a0d6e97142bd libclc: Use frexp and ldexp in trig reduction instead of bit hacking (#186982) 77ba0d9e244a libclc: Update pow functions (#186890) 6dcd70d10377 [AMDGPU] Use AMDGPULaneMaskUtils in SILowerI1Copies (#186170) fae024aca954 libclc: Move edge case handling of trig functions (#186429) 56d7920c093f [VPlan] Factor collectGroupedReplicateMemOps (NFC) (#186820) 7887ac6e7f2e [libc][docs] Update clang-tidy checks page (#185923) 1588f083694b [OpenMP][OMPT] Add missing `error` entry to device tracing record union (#185683) 19460ff85976 libclc: Use fshr builtin in sincos helpers (#186427) b5e825ec3839 [DA] Add test for the Weak Crossing SIV test misses dependency (NFC) (#186355) 9e43b35befa4 [clang][ssaf] Add --ssaf-list-{extractor,format} flags (#185428) c64d9af7b5f5 [llvm-link] Add more detail to `--internalize` description (#170397) 096371b7e334 libclc: Use struct for ep pair (#186973) 7c2aef4b58fc Reland "[lldb] Initial plugin and test for SymbolLocatorSymStore" (#185658) b091331f0605 [orc-rt] Fix stale file comment. NFC. b2442a20a946 [NFC][SPIRV] New test for untested SPIRVInstructionSelector case (#186069) 63ebca6a50ab Add zlib to Windows release build (#186630) 91b928f91936 [VPlan] Create header phis once regions have been created (NFC). 356717656324 merge main into amd-staging (#1774) d9066944d7d3 merge main into amd-staging (#1773) 3dc46e9fffe8 [lldb] Use clang_cl_host to build `vbases.test` (#186857) 3d421d59ad24 [DA] Refactor the signature of the Exact SIV test (NFCI) (#186386) 234aacf8e895 [C++20] [Modules] Diagnose for duplicated definition in the same module (#186959) f1b16eaeddbb [orc-rt] Hold `const void*` rather than `void*` in ControllerInterface. (#186954) 33cfc6ba610a [CI][libclc] Enable libclc in premerge CI with single target (#186104) b7843a241102 [RISCV][NFC] Remove duplicate setTargetDAGCombine registrations (#186928) 6439500cefaa [clang][bytecode] Clean up CondScope after while loop (#186816) 51b3b9b03907 [LV] Optimize x && (x && y) -> x && y (#185806) 69b83274578b [orc-rt] Add ControllerInterface symbol table. (#186947) 376f41439375 AArch64: Look through copies in CCMP converter. f572cc0e7edf [RISCV] Fold (WADDAU -C, -1, rs1, 0) -> (WSUBU rs1, C) where C > 0 (#186638) 711e8846714d [revPat] two reverts b005ff76f367 [ValueTracking] frem in computeKnownFPClass can not return +/-Inf (#186748) 193d26743b12 Revert "[Reland][IR] Add initial support for the byte type (#186888)" b988242a2542 merge main into amd-staging d97a7a13997e Revert "[Clang][OpenMP] Move declare simd codegen into OMPIRBuilder (#186030)" 4fdd4404744d Merge commit '4c63b28bb971' into amd/merge/upstream_merge_20260316230220 a8edc5355799 [mlir][Interfaces][NFC] Improve time complexity of RegionBranchOpInterface canonicalization patterns (#186114) 827012111082 [llvm-ir2vec] Refactoring the ir2vec python bindings testing (#180664) 2c8e855a7a97 merge main into amd-staging 88f1ec9a70bf [clang][OpenMP] Parse/Sema for OpenMP 6.0 declare_target 'local' clause (#186281) 38eebe843b41 [AMDGPU] Add s_sethalt to hasUnwantedEffectsWhenEXECEmpty (#186745) 4abb927bacf3 [libclc][CMake] Use clang/llvm-ar on Windows (#186726) fe1f51250234 [Clang][docs][test] Add N3517, N3652, and N3715 according to N3783 (#185566) 47970a4428d6 [flang][PPC] Update vector tests with nuw nsw (NFC) (#186879) 83965ad10c1e [clang] use canonical arguments for checking function tem…

stefankoncarevic added 4 commits March 24, 2025 09:55

Add support for 'row_share' DPP instruction

6384eea

- Implemented 'row_share' as a new DPP instruction. - Added verification logic for 'row_share' with permissible range [0-15]. - Updated test cases to include 'row_share' examples and checks.

stefankoncarevic requested a review from causten as a code owner March 31, 2025 12:39

stefankoncarevic requested review from dhernandez0, djramic and umangyadav March 31, 2025 12:47

stefankoncarevic added 2 commits April 1, 2025 12:22

Merge branch 'develop' into wave-reduction-dpp

df7dd42

dhernandez0 reviewed Apr 2, 2025

View reviewed changes

stefankoncarevic added 2 commits April 17, 2025 11:37

Updating tests.

88b9639

umangyadav requested a review from Copilot June 11, 2025 23:03

Copilot AI reviewed Jun 11, 2025

View reviewed changes

stefankoncarevic closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Implement DPP Reduction in wavefront#1796

[DRAFT] Implement DPP Reduction in wavefront#1796
stefankoncarevic wants to merge 8 commits intoROCm:developfrom
stefankoncarevic:wave-reduction-dpp

stefankoncarevic commented Mar 31, 2025 •

edited

Loading

Uh oh!

krzysz00 commented Apr 1, 2025

Uh oh!

dhernandez0 Apr 2, 2025

Uh oh!

dhernandez0 Apr 2, 2025 •

edited

Loading

Uh oh!

dhernandez0 Apr 2, 2025

Uh oh!

dhernandez0 Apr 2, 2025

Uh oh!

dhernandez0 Apr 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 11, 2025

Uh oh!

stefankoncarevic commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stefankoncarevic commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krzysz00 commented Apr 1, 2025

Uh oh!

dhernandez0 Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

dhernandez0 Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhernandez0 Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

dhernandez0 Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

dhernandez0 Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

stefankoncarevic commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stefankoncarevic commented Mar 31, 2025 •

edited

Loading

dhernandez0 Apr 2, 2025 •

edited

Loading