Refactor: collapse ChipWorker init/set_device → init(device_id, bins)#723
Open
ChaoWao wants to merge 1 commit intohw-native-sys:mainfrom
Open
Refactor: collapse ChipWorker init/set_device → init(device_id, bins)#723ChaoWao wants to merge 1 commit intohw-native-sys:mainfrom
ChaoWao wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
efc0058 to
5528c02
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors the ChipWorker API by integrating NPU device attachment into the init method and removing the explicit set_device and reset_device calls. Device operations such as malloc, free, and copy now internally handle thread attachment to ensure safety across different calling threads. Documentation, examples, and tests have been updated to reflect these changes. Feedback identifies potential data races in the initialization and attachment logic, along with a resource leak in the free_tensor error path.
00889af to
f48e496
Compare
ChipWorker exposed init / set_device / reset_device / finalize as four public lifecycle methods even though set_device existed only to satisfy CANN's per-thread aclrtSetDevice attach requirement (PR hw-native-sys#715), and reset_device had zero call sites. The split was the source of one bug (hw-native-sys#715) and forced every caller through a redundant ceremony. Collapse into a single attach point: - ChipWorker::init now takes device_id and attaches the calling thread internally via simpler_init; ChipWorker::set_device and ::reset_device are deleted, and device_set_ / device_set() collapse into initialized_. - C-API set_device export is removed; simpler_init grows a device_id parameter and returns int. The per-thread attach is centralized inside DeviceRunner::attach_current_thread (sim DeviceRunner gains the same method, driving pto_cpu_sim_bind_device + idempotent acquire_device). All ChipWorker callers run on the same thread that called init, so device_id_ stays a plain int — no atomic / no CAS needed. - Python `ChipWorker.init` wrapper takes (device_id, bins) — device_id first since it's the simpler, more fundamental "where" parameter. bootstrap_context's device_id arg becomes a defensive consistency check rather than the attach point. - Callers updated: python/simpler/worker.py, simpler_setup/scene_test.py, tests/ut/py/test_chip_worker.py, tests/ut/py/test_worker/test_platform_comm.py and test_bootstrap_context_{sim,hw}.py. - Docs (chip-level-arch, dynamic-linking, getting-started) and the worker_malloc / hello_worker examples reflect the new flow. Drive-by: ProfilerBase's constructor/destructor are now private with `friend Derived` (bugprone-crtp-constructor-accessibility); newer clang-tidy versions on macOS would otherwise reject any TU that transitively includes profiler_base.h.
f48e496 to
9939bd3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ChipWorkerexposedinit/set_device/reset_device/finalizeas four public lifecycle methods even thoughset_deviceexisted only to satisfy CANN's per-threadaclrtSetDeviceattach requirement (#715), andreset_devicehad zero call sites. The split was already the source of one bug (#715) and forced every caller through a redundant ceremony.This PR collapses the lifecycle to a single attach point at
init.What changed
C++ (
src/common/worker/chip_worker.{h,cpp})ChipWorker::initnow takesdevice_idand attaches the calling thread internally viasimpler_initChipWorker::set_deviceand::reset_devicedeleteddevice_set_flag anddevice_set()getter collapse intoinitialized_finalize()simplified — callsfinalize_device_fn_directlyC-API (
pto_runtime_c_api.h+ a2a3/a5 onboard+sim implementations)set_deviceexport removedsimpler_initsignature extended to(ctx, device_id, log_level, log_info_v)and now returnsint; folds the per-thread attach in alongside log configDeviceRunnerattach surface (a2a3 + a5, onboard + sim)attach_current_thread(int)is the centralized binder. Onboard:rtSetDevice. Sim:pto_cpu_sim_bind_device + pto_cpu_sim_acquire_device.device_id_stays a plainint. All ChipWorker callers run on the same thread that calledinit(), so the standard thread-spawn happens-before edge is sufficient — no atomic, no CAS, no per-op re-attach.Python
_ChipWorker.init/ChipWorker.inittakedevice_id(and the wrapper takes it beforebins, since it's the simpler, more fundamental "where" parameter)_ChipWorker.set_device/.reset_devicebindings removed;device_setproperty removedbootstrap_contextno longer callsset_device— the device is attached atinittime, the param is now used only as a defensive consistency checkDrive-by fix:
ProfilerBase's constructor/destructor are now private withfriend Derived(bugprone-crtp-constructor-accessibility). Newer clang-tidy versions on macOS would otherwise reject any TU that transitively includesprofiler_base.h.Callers updated
python/simpler/worker.py,simpler_setup/scene_test.py,tests/ut/py/test_chip_worker.py,tests/ut/py/test_worker/test_platform_comm.py, andtest_bootstrap_context_{sim,hw}.pycollapseinit+set_deviceinto a singleinit(device_id, bins)call.Docs
docs/chip-level-arch.md,docs/dynamic-linking.md,docs/getting-started.md, plus theworker_malloc/hello_workerexamples reflect the new flow.Test plan