Improving memory requirements on AMDGPU by neoblizz · Pull Request #885 · JuliaGPU/AMDGPU.jl

neoblizz · 2026-02-23T18:37:16Z

@luraess reviews would be appreciated!

Sets eager_gc to true by default.
The maybe_collect() function proactively triggers GC.gc(false) when GPU memory pressure exceeds ~75%, preventing excessive pool growth.
Allocation statistics and pool tracking.
Background pool cleanup every 60 seconds.
reclaim() function, supports trimming the pool and returns the bytes cleaned.

Testing Plan & Summary

Ran all tests locally on gfx942 (MI300X):

16,717 passed
13 failed
13 errors
30 broken
16,773 total (5m 42.5s)

Test	Failure	Reason
wmma_tests	ArgumentError: invalid base 10 digit ':' in "942:sramecc+:xnack-"	Pre-existing bug parsing the gfx942 architecture string (Fixed: `93fe6c6`)
hip_rocarray/solver	dA \ dB ≈ Af \ Bf -- numerical inaccuracy in linear solvers	Unrelated rocSOLVER numerical precision issue (produces wildly wrong results like 1e7 vs 0.05)
gpuarrays/broadcasting	Float16 broadcast comparison failure	Should also be fixed with: `93fe6c6`

Disclaimer: Parts of this PR are developed using claude-4.6-opus AI.

neoblizz · 2026-02-23T19:00:02Z

Reran it after the last fix;

1 failed -- the single failure is broadcast Float16 precision issue at GPUArrays/test/testsuite/broadcasting.jl:131 (pre-existing)
12 failed, 12 errors -- all failures are in Float16 \ matrix division (rocSOLVER numerical precision issue, pre-existing)

gbaraldi · 2026-02-23T21:15:14Z

Can you split the arch thing into a separate PR?

src/AMDGPU.jl

src/memory.jl

gbaraldi · 2026-02-23T21:29:09Z

I guess the main idea is what is motivating this? Changing the whole memory allocation logic should have some benchmarks/motivation associated with the PR.
I don't want to discourage work on this BTW.

Also FYI if this was written with AI help please do mention it, we have nothing against AI if the user is on the loop, but disclosure is needed

luraess · 2026-02-23T21:58:28Z

Thanks for looking into memory allocation and management challenges.

Given this has possibly critical impacts in terms of reliability and performance of the software, I am thankful that other JuliaGPU devs are looking into it as well.

I'd second @gbaraldi suggestions and would also like to see some benchmark in order to assess the situation. Running GC too often may have significant performance impacts and not all workflows or application may actually benefit from e.g. having eager_gc enabled.

neoblizz · 2026-02-24T04:08:46Z

@luraess thanks!
@gbaraldi Thank you for the review! Definitely not discouraged.

I guess the main idea is what is motivating this? Changing the whole memory allocation logic should have some benchmarks/motivation associated with the PR.

I understand this is a big change, I was running into situations where I was running a code that resulted in a lot large memory usage in AMD than in CUDA (ran out of memory on AMD GPU with much larger memory than a CUDA GPU with much smaller memory). After a good bit of investigation I narrowed it down to the functionality differences in the gc and how memory is reclaimed on CUDA.jl vs. AMDGPU.jl. I can provide some motivating data associated with this change soon.

Also FYI if this was written with AI help please do mention it, we have nothing against AI if the user is on the loop, but disclosure is needed

Understood - Will add a disclosure, its not entirely AI written. I did use it to help me understand the differences between CUDA.jl's implementation and this.

neoblizz · 2026-02-24T04:12:08Z

Can you split the arch thing into a separate PR?

Got it, will create a separate PR.

…skip it (same as CUDA.jl)

src/memory.jl

neoblizz · 2026-02-24T13:06:36Z

Can you split the arch thing into a separate PR?

#886

neoblizz · 2026-02-24T15:10:24Z

@gbaraldi @luraess I am running more tests, but I created a PR into my PR that showcases a very solid usecase for this change; neoblizz#1

Eager GC	Peak Used	% of Free	Completed
enabled	106.125 GiB	55%	267/267
disabled	190.739 GiB	99.7% (OOM)	133/267

Array size: 489.516 MiB
Free GPU: 191.225 GiB
Total iterations: 267

gbaraldi · 2026-02-24T17:05:22Z

That looks great!

neoblizz · 2026-02-25T00:14:43Z

@gbaraldi I found a separate issue where the pool isn't actually being used... 😄 Should I make additional fixes (memory leak and actually using the pool) in a separate PR?

https://github.com/JuliaGPU/AMDGPU.jl/blob/master/src/runtime/memory/hip.jl#L49-L50

(cc @luraess )

gbaraldi · 2026-02-25T00:35:37Z

That seems related enough :)

…d free/alloc to avoid memory leaks.

neoblizz · 2026-02-25T15:54:52Z

Requesting a new review because of the changes to HostBuffer (to fix memory leak, because it was unpinning all types of allocs even when they needed to be properly freed), and use of Pool. @gbaraldi

src/runtime/memory/hip.jl

neoblizz · 2026-02-27T01:25:14Z

@gbaraldi just checking if I can get a review again, appreciate the time!

src/runtime/memory/hip.jl

gbaraldi · 2026-02-27T12:07:24Z

LGTM. Though all of this makes me think we may want to have a package to share this code with CUDA.jl somewhere, given HIP seems to follow the CUDA APIs pretty closely @maleadt @vchuravy

neoblizz · 2026-02-27T17:00:46Z

LGTM. Though all of this makes me think we may want to have a package to share this code with CUDA.jl somewhere, given HIP seems to follow the CUDA APIs pretty closely @maleadt @vchuravy

I agree, in fact most of the cuda* equivalent should have hip*. Something to consider as a major refactor in the future. Also, please merge if its good and thank you for the reviews! 😊

src/runtime/memory/hip.jl

Co-authored-by: Ludovic Räss <61313342+luraess@users.noreply.github.com>

luraess · 2026-02-28T20:44:21Z

Is there a need for updating anything in the doc wrt this PR (e.g. in here https://amdgpu.juliagpu.org/dev/api/memory).

Also, are the new additions causing any API changes that would require a major release?

Besides the pool statistics benchmark results report (which looks great), I do not see the test from neoblizz#1 actually merged in this PR branch. Do I overlook something? Also, besides the % of free, do you have any reports on the timing with the new memory management approach versus previous?

neoblizz · 2026-02-28T22:51:53Z

Is there a need for updating anything in the doc wrt this PR (e.g. in here https://amdgpu.juliagpu.org/dev/api/memory).

Also, are the new additions causing any API changes that would require a major release?

Besides the pool statistics benchmark results report (which looks great), I do not see the test from neoblizz#1 actually merged in this PR branch. Do I overlook something? Also, besides the % of free, do you have any reports on the timing with the new memory management approach versus previous?

I have the test in the other branch, I will create a separate PR for that.

I did some testing on a Quantum Error Correction (QEC) workload -- there's no significant difference in runtime. If there's workloads you'd like me to run, I can run to make sure this holds true. Let me share raw latency results with this version and the one in master for QEC.

neoblizz · 2026-02-28T22:52:59Z

Is there a need for updating anything in the doc wrt this PR (e.g. in here https://amdgpu.juliagpu.org/dev/api/memory).

Going through the docs now!

neoblizz · 2026-03-03T16:58:41Z

Measuring the runtime diffs vs. master for some of the tests. Effectively no change on an AMD MI300X GPU.

Test	`neoblizz:neoblizz/memory-improvements` (s)	`master` (s)	Delta (s)	Delta (%)
gpuarrays/reductions/minimum maximum extrema	210.04	211.45	-1.41	-0.7%
hip_rocarray/blas	185.75	184.47	+1.28	+0.7%
gpuarrays/linalg/norm	183.70	182.08	+1.62	+0.9%
gpuarrays/reductions/sum prod	171.39	173.80	-2.41	-1.4%
gpuarrays/linalg/kron	159.97	159.21	+0.76	+0.5%
gpuarrays/linalg/core	148.97	147.94	+1.03	+0.7%
gpuarrays/reductions/mapreduce	138.75	138.32	+0.43	+0.3%
gpuarrays/reductions/mapreducedim!	135.21	137.19	-1.98	-1.4%

neoblizz · 2026-03-03T17:07:55Z

GC Time & GC %

Test	`neoblizz:neoblizz/memory-improvements` GC (s)	`master` GC (s)	MI GC %	master GC %
gpuarrays/reductions/min max extrema	4.76	4.82	2.3	2.3
hip_rocarray/blas	5.23	5.25	2.8	2.8
gpuarrays/linalg/norm	3.56	3.58	1.9	2.0
gpuarrays/reductions/sum prod	3.41	3.49	2.0	2.0
gpuarrays/linalg/kron	27.07	27.61	16.9	17.3
gpuarrays/linalg/core	3.82	3.77	2.6	2.6
gpuarrays/reductions/mapreduce	2.99	2.74	2.2	2.0
gpuarrays/reductions/mapreducedim!	2.82	2.91	2.1	2.1
gpuarrays/reductions/reduce	3.14	2.93	2.3	2.2

GC is identical between branches.

CPU Allocations (MB)

Test	`neoblizz:neoblizz/memory-improvements` Alloc (MB)	`master` Alloc (MB)	Delta (MB)	Delta (%)
gpuarrays/reductions/min max extrema	16,490	16,310	+180	+1.1%
hip_rocarray/blas	20,661	20,528	+133	+0.6%
gpuarrays/linalg/norm	12,457	12,262	+195	+1.6%
gpuarrays/reductions/sum prod	12,020	11,822	+198	+1.7%
gpuarrays/linalg/kron	33,856	33,727	+129	+0.4%
gpuarrays/linalg/core	14,212	14,053	+159	+1.1%
gpuarrays/reductions/mapreduce	10,367	10,187	+180	+1.8%
gpuarrays/reductions/mapreducedim!	9,017	8,888	+128	+1.4%
gpuarrays/reductions/reduce	10,149	9,972	+177	+1.8%

Slightly more allocations on the CPU-side.

RSS

Test	`neoblizz:neoblizz/memory-improvements` RSS (MB)	`master` RSS (MB)	Delta (MB)
gpuarrays/reductions/min max extrema	4,301	4,304	-3
hip_rocarray/blas	4,305	4,309	-4
gpuarrays/linalg/norm	4,305	4,309	-4
gpuarrays/reductions/sum prod	4,305	4,309	-4
gpuarrays/linalg/kron	4,279	4,309	-30
gpuarrays/linalg/core	4,279	4,309	-30
gpuarrays/reductions/mapreduce	4,279	4,309	-30
gpuarrays/reductions/mapreducedim!	4,279	4,309	-30
gpuarrays/reductions/reduce	4,279	4,309	-30

Roughly the same, slightly better.

luraess · 2026-03-04T07:46:10Z

Thanks for reporting these benchmarks results! LGT from what I can see. Would it make sense to modify/add something wrt the changes in https://amdgpu.juliagpu.org/dev/api/memory? @gbaraldi anything else I would overlook?

gbaraldi · 2026-03-04T11:14:29Z

Nope. Looks good to me

neoblizz · 2026-03-04T17:44:22Z

Thanks for reporting these benchmarks results! LGT from what I can see. Would it make sense to modify/add something wrt the changes in https://amdgpu.juliagpu.org/dev/api/memory? @gbaraldi anything else I would overlook?

HostAlloc and this will need to change; https://amdgpu.juliagpu.org/dev/api/memory#:~:text=Passing%20own%3Dtrue%20keyword%20will%20make%20the%20wrapped%20array%20take%20the%20ownership%20of%20the%20memory.%20For%20host%20memory%20it%20will%20unpin%20it%20on%20destruction%20and%20for%20device%20memory%20it%20will%20free%20it.

I can propose docs changes in a separate PR? (Will do this over the weekend).

luraess · 2026-03-04T20:25:36Z

Thanks! And that'd be great if one could fix the docs in the near future as suggested.

neoblizz · 2026-03-04T21:03:44Z

Will do, thank you for letting me contribute! @luraess @gbaraldi

I'll bring up a few more PRs based on what we discussed.

neoblizz added 3 commits February 23, 2026 18:12

Adds better memory clean-up.

51358de

...

46ae045

Properly parse the arch string.

93fe6c6

gbaraldi reviewed Feb 23, 2026

View reviewed changes

src/AMDGPU.jl Outdated Show resolved Hide resolved

gbaraldi reviewed Feb 23, 2026

View reviewed changes

src/memory.jl Show resolved Hide resolved

gbaraldi reviewed Feb 23, 2026

View reviewed changes

src/memory.jl Show resolved Hide resolved

neoblizz added 2 commits February 24, 2026 12:48

Restore comments.

0cf7f7d

Only start when memory is being used, and non-interactive batch jobs …

fc74b77

…skip it (same as CUDA.jl)

neoblizz commented Feb 24, 2026

View reviewed changes

src/memory.jl Show resolved Hide resolved

Moving arch-string fix to a separate PR.

2bfbd1e

neoblizz added 2 commits February 25, 2026 02:12

Major change: Use MallocFromPool, Separate out register/unregister an…

544eb7c

…d free/alloc to avoid memory leaks.

Update with proper unregister.

9f84735

neoblizz force-pushed the neoblizz/memory-improvements branch from 18127d0 to 9f84735 Compare February 25, 2026 02:14

neoblizz requested a review from gbaraldi February 25, 2026 15:53

neoblizz commented Feb 25, 2026

View reviewed changes

src/runtime/memory/hip.jl Show resolved Hide resolved

gbaraldi reviewed Feb 27, 2026

View reviewed changes

src/runtime/memory/hip.jl Show resolved Hide resolved

luraess reviewed Feb 28, 2026

View reviewed changes

src/runtime/memory/hip.jl Outdated Show resolved Hide resolved

Update src/runtime/memory/hip.jl

4ba5b7b

Co-authored-by: Ludovic Räss <61313342+luraess@users.noreply.github.com>

luraess merged commit b7fb0b0 into JuliaGPU:master Mar 4, 2026
3 checks passed

neoblizz deleted the neoblizz/memory-improvements branch March 4, 2026 21:03

neoblizz mentioned this pull request Mar 14, 2026

Update memory-related docs. #891

Merged

Conversation

neoblizz commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing Plan & Summary

Uh oh!

neoblizz commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbaraldi commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gbaraldi commented Feb 23, 2026

Uh oh!

luraess commented Feb 23, 2026

Uh oh!

neoblizz commented Feb 24, 2026

Uh oh!

neoblizz commented Feb 24, 2026

Uh oh!

Uh oh!

neoblizz commented Feb 24, 2026

Uh oh!

neoblizz commented Feb 24, 2026

Uh oh!

gbaraldi commented Feb 24, 2026

Uh oh!

neoblizz commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbaraldi commented Feb 25, 2026

Uh oh!

neoblizz commented Feb 25, 2026

Uh oh!

Uh oh!

neoblizz commented Feb 27, 2026

Uh oh!

Uh oh!

gbaraldi commented Feb 27, 2026

Uh oh!

neoblizz commented Feb 27, 2026

Uh oh!

Uh oh!

luraess commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neoblizz commented Feb 28, 2026

Uh oh!

neoblizz commented Feb 28, 2026

Uh oh!

neoblizz commented Mar 3, 2026

Uh oh!

neoblizz commented Mar 3, 2026

GC Time & GC %

CPU Allocations (MB)

RSS

Uh oh!

luraess commented Mar 4, 2026

Uh oh!

gbaraldi commented Mar 4, 2026

Uh oh!

neoblizz commented Mar 4, 2026

Uh oh!

Uh oh!

luraess commented Mar 4, 2026

Uh oh!

neoblizz commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neoblizz commented Feb 23, 2026 •

edited

Loading

neoblizz commented Feb 23, 2026 •

edited

Loading

neoblizz commented Feb 25, 2026 •

edited

Loading

luraess commented Feb 28, 2026 •

edited

Loading