Skip to content

Support a2a3 SDMA async completion#696

Merged
jvjhfhg merged 2 commits intohw-native-sys:mainfrom
PKUZHOU:zhouzhe/a2a3-sdma-async-completion
May 8, 2026
Merged

Support a2a3 SDMA async completion#696
jvjhfhg merged 2 commits intohw-native-sys:mainfrom
PKUZHOU:zhouzhe/a2a3-sdma-async-completion

Conversation

@PKUZHOU
Copy link
Copy Markdown
Contributor

@PKUZHOU PKUZHOU commented Apr 28, 2026

Summary

  • add a2a3 SDMA deferred completion support using PTO-ISA SDMA event records
  • keep the existing counter completion ABI and add only the SDMA event-record completion type
  • add an a2a3 SDMA async completion demo that validates producer deferred completion and consumer dependency release

Details

  • pto2_defer_pto_async_event() now converts an SDMA AsyncEvent into runtime-polled event-record completion entries
  • runtime polls SdmaEventRecord::flag and retires the record by clearing it and committing the completed SQ tail
  • host HCCL comm can initialize a PTO-ISA SDMA workspace behind SIMPLER_ENABLE_PTO_SDMA_WORKSPACE
  • fixes the non-profiling fanin-ready path to transition PENDING -> READY before queueing

Validation

  • build-only: build_chip_callable("a2a3", None, "https")
  • hardware: python examples/a2a3/tensormap_and_ringbuffer/sdma_async_completion_demo/test_sdma_async_completion_demo.py -p a2a3 -d 1-2 --build
    • rank 0: max_out=0.000e+00 max_result=0.000e+00
    • rank 1: max_out=0.000e+00 max_result=0.000e+00
  • regression: python examples/a2a3/tensormap_and_ringbuffer/deferred_notify_demo/test_deferred_notify_demo.py -p a2a3sim -d 0-1 --build
    • rank 0/1: max_diff=0.000e+00

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements support for deferred SDMA asynchronous completion. It introduces a new demo and smoke test, integrates SDMA workspace management into the host communication layer, and extends the device-side async API to support SDMA event records. The scheduler and polling logic were updated to handle these new completion types and ensure thread-safe task transitions. Feedback was provided to include additional layout assertions for the PTO2SdmaEventRecord struct to ensure the safety of atomic operations.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_async_wait.h
@jvjhfhg jvjhfhg force-pushed the zhouzhe/a2a3-sdma-async-completion branch 3 times, most recently from 3d99e93 to c99ed28 Compare May 8, 2026 08:31
@jvjhfhg jvjhfhg force-pushed the zhouzhe/a2a3-sdma-async-completion branch from c99ed28 to 3c6a461 Compare May 8, 2026 08:48
@jvjhfhg jvjhfhg merged commit 8280740 into hw-native-sys:main May 8, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants