feat(qwen3): double-buffered decoding with on-device embedding lookup by vgene · Pull Request #46 · aws-neuron/nkipy

vgene · 2026-03-27T05:31:20Z

Summary

Fuse greedy sampling + embedding lookup into a single device kernel (greedy_sampling_with_embedding) so the token embedding stays on device, eliminating the per-token host round-trip (D2H token ID → host embedding lookup → H2D embedding)
Double-buffer next_id output: two alternating DeviceTensors so the D2H of the previous token overlaps with the current iteration's non-blocking kernel execution
Add --no-double-buffering CLI flag to fall back to the original baseline decode path for A/B performance comparison

Test plan

Run with double buffering (default): torchrun ... qwen3.py "prompt" — verify correct output and tokens/sec
Run without: torchrun ... qwen3.py --no-double-buffering "prompt" — verify identical output
Compare tokens/sec between the two modes

…okup Fuse greedy sampling and embedding lookup into a single device kernel so the selected token's embedding stays on device and feeds the next iteration directly, eliminating the per-token host round-trip (D2H token ID → host embedding lookup → H2D embedding). Two next_id buffers alternate so the D2H of the previous token overlaps with the current iteration's non-blocking kernel execution. Adds --no-double-buffering flag to compare performance against the baseline decode path.

vgene requested a review from a team March 27, 2026 05:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qwen3): double-buffered decoding with on-device embedding lookup#46

feat(qwen3): double-buffered decoding with on-device embedding lookup#46
vgene wants to merge 1 commit intomainfrom
feat/double-buffering

vgene commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vgene commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vgene commented Mar 27, 2026 •

edited

Loading