Skip to content

Sparse heaps#392

Open
agolokoz wants to merge 10 commits intosparse-buffersfrom
sparse-heaps
Open

Sparse heaps#392
agolokoz wants to merge 10 commits intosparse-buffersfrom
sparse-heaps

Conversation

@agolokoz
Copy link
Copy Markdown
Contributor

@agolokoz agolokoz commented May 7, 2026

No description provided.

@agolokoz agolokoz marked this pull request as ready for review May 8, 2026 10:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dbc3b55b61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/backend-uzu/src/backends/metal/sparse/sparse_heap_pool.rs
Comment thread crates/backend-uzu/src/backends/metal/sparse/sparse_buffer.rs
@agolokoz
Copy link
Copy Markdown
Contributor Author

agolokoz commented May 8, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6d5015647

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/backend-uzu/src/backends/metal/sparse/sparse_buffer.rs
Copy link
Copy Markdown
Contributor

@CC-Yeh CC-Yeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having a warm pool of heaps to reduce TTFT? we can trade some memory for latency. Might be useful for short inputs.

let device_capabilities = MetalDeviceCapabilities::from_device(&device);

let page_size = MTLSparsePageSize::KB256;
let heap_capacity = 64 * 4 * page_size.byte_size().as_u64() as usize;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 64MB? Any trade-offs?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could make this configurable and smaller for iphones, bigger for macs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could make this configurable and smaller for iphones, bigger for Macs

Or even for different cases we can have different pools

Sizes will be set up together with first usage of sparse buffers. For now I just add some value

})
.collect();

cmd_queue.update_buffer_mappings(buffer, Some(&self.heap), &mtl_operations);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential race? Do we need fence/barrier for intra/inter queue cases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caller must be responsible for synchronizations because in case of many maps, for example, it's not good idea to synchronize every map

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, how can we enforce this? Maybe tests + documentation can help


#[derive(Clone, PartialEq)]
pub(super) struct MetalSparseHeapBufferMapping {
gpu_address: u64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could gpu_address be brittle if Drop fails or finished partially?
Maybe store a pointer to buffer? Or we could store the mapping on buffer side?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could gpu_address be brittle if Drop fails or finished partially?

How it's possible?
Anyway after drop() call object is not usable anymore.

Or I didn't understand what you mean.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool keys is bookkeeping by gpu_address, a value Metal can reassign to a new buffer later, so any orphan entry left behind by a partial Drop becomes indistinguishable from a live entry.

But in case of duplicate entries, probably first unmap would release the heap and second one is a no-op.

Comment thread crates/backend-uzu/src/backends/metal/sparse/sparse_heap.rs
Comment thread crates/backend-uzu/src/backends/metal/sparse/sparse_buffer.rs
}
}

heap.execute(buffer, &context.command_queue4, &mappings, true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early return when no overlapping to skip expensive op?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are nothing expensive inside if mappings are empty

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that cpu will talk to GPU even if mappings is empty, maybe this is cheap I'm not sure.

@agolokoz
Copy link
Copy Markdown
Contributor Author

agolokoz commented May 8, 2026

What do you think about having a warm pool of heaps to reduce TTFT? we can trade some memory for latency. Might be useful for short inputs.

Planned to do in the future

@agolokoz agolokoz requested a review from CC-Yeh May 8, 2026 15:24
Copy link
Copy Markdown
Contributor

@CC-Yeh CC-Yeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, let's also get a pass from @uuuvn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants