Skip to content

Reduce GPU oversubscription#25

Open
yaoliu13 wants to merge 1 commit intoamd-integrationfrom
ReduceMaxBlocksPerCU
Open

Reduce GPU oversubscription#25
yaoliu13 wants to merge 1 commit intoamd-integrationfrom
ReduceMaxBlocksPerCU

Conversation

@yaoliu13
Copy link
Copy Markdown
Collaborator

@yaoliu13 yaoliu13 commented May 3, 2026

This patch reduces the maximum amount of threablocks launched per CU to 8, instead of 32. The result is a smaller number of threadblocks that have no work to do, on average testing. I see a 7% improvement in my local system.

By carlobertolli

@yaoliu13
Copy link
Copy Markdown
Collaborator Author

yaoliu13 commented May 3, 2026

/run-ci

This patch reduces the maximum amount of threablocks launched per CU to 8, instead of 32. The result is a smaller number of threadblocks that have no work to do, on average testing. I see a 7% improvement in my local system.

By carlobertolli
@yaoliu13 yaoliu13 force-pushed the ReduceMaxBlocksPerCU branch from 109d6d0 to 877a565 Compare May 3, 2026 20:16
@yaoliu13
Copy link
Copy Markdown
Collaborator Author

yaoliu13 commented May 3, 2026

/run-ci

2 similar comments
@yaoliu13
Copy link
Copy Markdown
Collaborator Author

yaoliu13 commented May 5, 2026

/run-ci

@yaoliu13
Copy link
Copy Markdown
Collaborator Author

yaoliu13 commented May 5, 2026

/run-ci

@yaoliu13 yaoliu13 closed this May 5, 2026
@yaoliu13 yaoliu13 reopened this May 5, 2026
@yaoliu13
Copy link
Copy Markdown
Collaborator Author

yaoliu13 commented May 6, 2026

/run-ci

1 similar comment
@yaoliu13
Copy link
Copy Markdown
Collaborator Author

yaoliu13 commented May 6, 2026

/run-ci

@yaoliu13
Copy link
Copy Markdown
Collaborator Author

yaoliu13 commented May 7, 2026

pre-submit is not good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant