Skip to content

Rework runtime and add Mppa target#47

Open
ElectrikSpace wants to merge 5 commits intoxtc-tools:mainfrom
ElectrikSpace:dev/snoiry/mppa_target
Open

Rework runtime and add Mppa target#47
ElectrikSpace wants to merge 5 commits intoxtc-tools:mainfrom
ElectrikSpace:dev/snoiry/mppa_target

Conversation

@ElectrikSpace
Copy link
Contributor

@ElectrikSpace ElectrikSpace commented Feb 12, 2026

This PR rework the runtime to facilitate the support of new targets, especially accelerators.

It also proposes an integration of the Kalray Mppa target.

@ElectrikSpace ElectrikSpace force-pushed the dev/snoiry/mppa_target branch 2 times, most recently from 8718fbd to bfb8226 Compare February 13, 2026 13:33
@ElectrikSpace ElectrikSpace marked this pull request as ready for review February 13, 2026 13:34
@ElectrikSpace
Copy link
Contributor Author

PR #49 is required to run the CI on MacOS.

@ElectrikSpace
Copy link
Contributor Author

Update SDist requirements

@ElectrikSpace ElectrikSpace force-pushed the dev/snoiry/mppa_target branch 2 times, most recently from 6f1ffe9 to eaac320 Compare February 19, 2026 13:34
@ElectrikSpace ElectrikSpace self-assigned this Feb 19, 2026
@ElectrikSpace ElectrikSpace force-pushed the dev/snoiry/mppa_target branch 2 times, most recently from 759afa8 to 74f3b3b Compare February 26, 2026 16:58
@guillon guillon added the enhancement New feature or request label Feb 27, 2026
@guillon
Copy link
Member

guillon commented Mar 2, 2026

@ElectrikSpace I do not understand the status of this review. The description seems to still depend on a review which was merged and specify to look only at the last commit. Though it seems that you rebased it already. But there are still 5 commits.
Can you rebase and update the description?

@ElectrikSpace
Copy link
Contributor Author

I didn't update the description because the status of the dependent PR has been set to merged automatically, but I can remove it

@guillon guillon assigned guillon and unassigned ElectrikSpace Mar 3, 2026
Copy link
Member

@guillon guillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal, it's very cool.

In addition to inline comments:

  • you may add to docs/develop/optional_backends.md in the sdist section possibly the way to test for mppa as we do not have automation there yet, I can verify if I can run it on our local machine.
  • also for the nvgpu, it seems that there are tests missing, can you add there also a section for this target and how to run it? I may try to run it on a grid 5000 machine with GPU.

@guillon guillon assigned ElectrikSpace and unassigned guillon Mar 3, 2026
@ElectrikSpace ElectrikSpace force-pushed the dev/snoiry/mppa_target branch from 43d5d1f to 1566ce6 Compare March 10, 2026 16:18
This goal of the work on the runtime is to facilitate the future
integration of non-host targets. New features are also added, especially
for accelerators.

Create common interfaces that runtimes will need to implement.
This patch introduces a runtime base named CommonRuntimeInterface which
is common among three derives classes of runtimes:
  - Host (no derived class for now)
  - AcceleratorDevice for accelerators. Instance of this class are
    called devices.
  - EmbeddedDevice for external embedded processors.

Instances of AcceleratorDevice and EmbeddedDevice are called devices.
Unlike the current lazy runtime resolution, the concept of devices will
allow to handle multiples accelerator of the same class.
Apply the new runtime interfaces on the two existing runtimes:
 - Create HostRuntime singleton class that derives from
   CommonRuntimeInterface
 - Create GPUDevice singleton (for now) class that derives from
   AcceleratorDevice to implement the GPU runtime.
Some method implementations are shared accross the two, but adding a
specific implemention for a particular runtime is now easier.

The GPU target has been completely split from the Host target.
To prevent code duplication due to a clear split between Host and GPU
runtimes, common code portions have been factorized in utils.

With this rework of the runtimes, the call path has been simplified,
confusing classes like Executor/Evaluator and functions like
load_and_evaluate have been removed.
In the context on computation offloaded on a accelerator device, the
user can specify where input/output tensors live when the evaluation
begins. This allows to simulate weights tensors to be transfered
ahead of time. This feature is only supported for the MLIR backend,
but setting a "memref.on_device" attribute.
@ElectrikSpace ElectrikSpace force-pushed the dev/snoiry/mppa_target branch from 1566ce6 to bad01b8 Compare March 11, 2026 13:30
@ElectrikSpace
Copy link
Contributor Author

you may add to docs/develop/optional_backends.md in the sdist section possibly the way to test for mppa as we do not have automation there yet, I can verify if I can run it on our local machine.

Done

also for the nvgpu, it seems that there are tests missing, can you add there also a section for this target and how to run it? I may try to run it on a grid 5000 machine with GPU.

There are not special primitives used only on GPUs (not yet), so GPU tests files are exactly the same as host ones for now. But it would be nice to run GPU tests in CI (or maybe wait until we get real performances on GPUs)

@ElectrikSpace ElectrikSpace requested a review from guillon March 11, 2026 13:37
Create a new Mppa compilation target for MLIR. This target is the
first one to implement the ahead of time offloading of tensors on device.

Create a new Mppa runtime derived from AcceleratorDevice. The runtime
can be configured on various aspect (check MppaConfig class). Execution
is supported on ISS, Qemu, and Hardware.

Note: In order to use the Mppa target, the Kalray Core Toolchain must be
installed. mlir_sdist and mlir_mppa must also be installed.
Rely on the kvxuks-catch pass to catch micro-kernels in replacement of
the transform dialect based vectorization.
@ElectrikSpace ElectrikSpace force-pushed the dev/snoiry/mppa_target branch from bad01b8 to c419ac4 Compare March 11, 2026 14:04
@guillon guillon assigned guillon and unassigned ElectrikSpace Mar 12, 2026
guillon

This comment was marked as duplicate.

guillon

This comment was marked as duplicate.

Copy link
Member

@guillon guillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review was generated by Father, Guils' AI assistant.

Review: #47

Summary

This PR adds a substantial MPPA target/runtime path and refactors evaluation/runtime plumbing to share more code across host and accelerator backends. The overall direction makes sense, but I found two correctness regressions that should be addressed before merge.

Findings

[high] Device-aware NDArray creation breaks existing GPU evaluation path

Where

  • src/xtc/utils/evaluation.py:89-107
  • src/xtc/runtimes/types/ndarray.py:84-105,159-164
  • src/xtc/runtimes/accelerator/gpu/GPUDevice.py:126-149

Why this is a problem
ensure_ndarray_parameters() now propagates spec["device"] into NDArray(...). For accelerator-backed tensors, NDArray.__init__() immediately calls _to_device(), and later NDArray.data / NDArray.numpy() use the accelerator memory APIs.

That is fine for MPPA because MppaDevice implements memory_allocate, memory_copy_to, memory_copy_from, and memory_data_pointer. But it regresses the pre-existing GPU path: GPUDevice still raises NotImplementedError for all of those methods.

So any graph/spec that now carries device=gpu will fail before execution even starts. In practice, this change converts what used to be host-backed + registered buffers in GPUEvaluator into true device-backed NDArrays, but the GPU runtime was not updated to support that contract.

Impact
Existing GPU evaluation/offload flows become unusable as soon as tensor specs propagate a GPU device through the new generic evaluation helpers.

Actionable fix
Pick one model and make it consistent:

  1. Short-term safe fix: keep GPU on the old host-buffer model in ensure_ndarray_parameters() (only create device-backed NDArrays for accelerators that fully implement the memory API, e.g. MPPA).
  2. Long-term fix: implement GPUDevice.memory_allocate/free/copy_to/copy_from/memory_data_pointer and then remove the manual host-buffer registration path from GPUEvaluator.

Until one of those is done, the generic device propagation introduced here is incomplete.


[medium] MPPA codegen builds intermediates in the current working directory instead of the chosen temp/output directory

Where

  • src/xtc/backends/mlir/MlirTarget/MlirMppaTarget.py:78-85,94-145

Why this is a problem
The method computes dump_tmp_file = f"{dump_tmp_dir}/{dump_base}", but the actual intermediate filenames are built as bare relative names:

  • f"{dump_base}.before_mppa.mlir"
  • f"{dump_base}.after_mppa.mlir"
  • f"{dump_base}.host.c"
  • f"{dump_base}.accelerator.c"
  • etc.

Those files are then opened, compiled, and deleted via relative paths. That means they land in the process CWD rather than dump_tmp_dir, unlike the other MLIR targets in this tree which use dump_tmp_file for intermediates.

Impact

  • unexpected file leakage into the caller’s CWD
  • collisions between concurrent builds using the same payload name
  • failures when the CWD is not writable
  • tempfile.mkdtemp() is effectively ignored for most intermediates

Actionable fix
Construct the intermediate paths from dump_tmp_file, e.g.:

  • mlir_bmppa_dump_file = f"{dump_tmp_file}.before_mppa.mlir"
  • mlir_amppa_dump_file = f"{dump_tmp_file}.after_mppa.mlir"
  • c_host_dump_file = f"{dump_tmp_file}.host.c"
  • etc.

That would align MPPA with the path handling already used by the C/LLVM/NVGPU targets.

Notes

  • I also noticed some unsafe shell/path handling in the new MPPA config/runtime (os.system("rm -r " + path) style cleanup/build commands), but I’m treating that as follow-up hardening rather than a merge blocker for this PR.
  • I did not modify repository code; this was a read-only review.

out,
runtime=spec["device"] if "device" in spec else HostRuntime.get(),
)
for out, spec in zip(outputs, outputs_spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Father] [high] Device-aware NDArray creation breaks existing GPU evaluation path — ensure_ndarray_parameters() now propagates spec["device"] into NDArray(...).

See the main review comment for full details and recommended fixes.

if self.is_on_device():
assert isinstance(self.runtime, AcceleratorDevice)
return self.runtime.memory_data_pointer(self.device_handle)
return self.handle.contents.dl_tensor.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Father] [high] Device-aware NDArray creation breaks existing GPU evaluation path — ensure_ndarray_parameters() now propagates spec["device"] into NDArray(...).

See the main review comment for full details and recommended fixes.

def memory_data_pointer(self, acc_handle: Any) -> ctypes.c_void_p:
raise NotImplementedError(
"memory_data_pointer is not implemented for GPU device"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Father] [high] Device-aware NDArray creation breaks existing GPU evaluation path — ensure_ndarray_parameters() now propagates spec["device"] into NDArray(...).

See the main review comment for full details and recommended fixes.

os.remove(mlir_bmppa_dump_file)
os.remove(mlir_amppa_dump_file)
os.remove(c_host_dump_file)
os.remove(c_accelerator_dump_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Father] [medium] MPPA codegen builds intermediates in the current working directory instead of the chosen temp/output directory — The method computes dump_tmp_file = f"{dump_tmp_dir}/{dump_base}", but the actual intermediate filenames are built as bare relative names: - f"{dump_base}.before_mppa.mlir" - f"{dump_base}.after_mppa.mlir" - f"{dump_base}.host.c" -

See the main review comment for full details and recommended fixes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed seems to not be generated in tmp_dir when dump_file is not none.

Copy link
Member

@guillon guillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to still have the issue with tmp files not in a tmpdir when: dump_file is not None and save_temps is False.

os.remove(mlir_bmppa_dump_file)
os.remove(mlir_amppa_dump_file)
os.remove(c_host_dump_file)
os.remove(c_accelerator_dump_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed seems to not be generated in tmp_dir when dump_file is not none.

@guillon guillon assigned ElectrikSpace and unassigned guillon Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants