Skip to content

Conversation

@fyu1
Copy link
Collaborator

@fyu1 fyu1 commented Oct 30, 2025

After PR #222 is closed, I found ARM released a new MPAM branch that contains extra patches: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot%2bextras/v6.18-rc1

I think it's better to merge the new branch into 6.17. The new branch contains:

  1. v3 MPAM base driver
  2. Extra patches with features like cache min/max etc.

To make PR work, I need to backport two resctrl patch sets (Babu's and Tony's) from 6.17 upstream before I can backport the new branch.

Ian and Matt suggest to ignore PR #222 and create this new PR for clean backport.

@fyu1 fyu1 changed the title Please pull 24.04 linux nvidia 6.17 next.mpam.extras Please pull MPAM 24.04 linux nvidia 6.17 next.mpam.extras Oct 30, 2025
@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras branch from ebd7620 to 8c698aa Compare October 30, 2025 02:56
@nvmochs
Copy link
Collaborator

nvmochs commented Oct 30, 2025

@fyu1 Can you pick these with -x -s and add the source after the SHA in the pick tag?

a0506716c52c535f677b0fdc498ddc8a0325473f - NVIDIA: SAUCE: arm_mpam: resctrl: Allow resctrl to allocate monitors
76bfc8a7f506a8ed662a83af20dbc8edd1baf802 - NVIDIA: SAUCE: cacheinfo: Add helper to find the cache size from cpu+level
3a01ed77dbd97420cd96556b1ef2b0e05b8d2503 - NVIDIA: SAUCE: arm_mpam: Add kunit tests for props_mismatch()
6f0e7638090ed22950359ccfff07a66a127d5c3f - NVIDIA: SAUCE: arm_mpam: Add kunit test for bitmap reset
7d9ed818050dd1066cf23d522861b034191afb3e - NVIDIA: SAUCE: arm_mpam: Add helper to reset saved mbwu state
691674810737622d1b4c05e00e0f0b4f31233ecb - NVIDIA: SAUCE: arm_mpam: Use long MBWU counters if supported
e1ae558ea58061ebcc6e20171016d5520d120dbf - NVIDIA: SAUCE: arm_mpam: Probe for long/lwd mbwu counters
e304d9636b51699816459ad2516714c8629486ca - NVIDIA: SAUCE: arm_mpam: Track bandwidth counter state for overflow and power management
f8cc695d7263c0562f1dc262a5986143dd5dfdde - NVIDIA: SAUCE: arm_mpam: Add mpam_msmon_read() to read monitor value
f9aac7fc1818af57ec68a346630d1e7b0a900adf - NVIDIA: SAUCE: arm_mpam: Add helpers to allocate monitors
b10bf6385e98cc9bf64fc33c2bd198b980df889f - NVIDIA: SAUCE: arm_mpam: Probe and reset the rest of the features
911fe53463c620ebe0cc614b95a257737822fd4f - NVIDIA: SAUCE: arm_mpam: Allow configuration to be applied and restored during cpu online
01d9ed41d7b2263c8e81b85bc86eaebb8efae276 - NVIDIA: SAUCE: arm_mpam: Use a static key to indicate when mpam is enabled
e1cbddd5415859214a85cc361f80c44dabc13091 - NVIDIA: SAUCE: arm_mpam: Register and enable IRQs
9205feb49fe51841cd338ba8cd0a5a4975fb9329 - NVIDIA: SAUCE: arm_mpam: Extend reset logic to allow devices to be reset any time
a4041c8237a2e11aa4f6a00e91476dba9ebc865e - NVIDIA: SAUCE: arm_mpam: Reset MSC controls from cpuhp callbacks
1e8b3416c599532ca3f9a36aa174eab149fd92e8 - NVIDIA: SAUCE: arm_mpam: Merge supported features during mpam_enable() into mpam_class
4015668ce84aa39ae517acd9a8a0e646e82371d7 - NVIDIA: SAUCE: arm_mpam: Probe the hardware features resctrl supports
aba5e9bf5b9b641acf6e93b94b7f89b40ba5c0e5 - NVIDIA: SAUCE: arm_mpam: Add helpers for managing the locking around the mon_sel registers
1aa12fe59e7e508141ad2075f5619c8818cfedc4 - NVIDIA: SAUCE: arm_mpam: Probe hardware to find the supported partid/pmg values
0f49d084fdf9c66d1dbd8251fd7a953e68f116af - NVIDIA: SAUCE: arm_mpam: Add cpuhp callbacks to probe MSC hardware
18d92ec00c98591d11ccd05a6a769ac71040d49c - NVIDIA: SAUCE: arm_mpam: Add MPAM MSC register layout definitions
3b0705188ae0b3d506c59972d85efc04da2f1363 - NVIDIA: SAUCE: arm_mpam: Add the class and component structures for firmware described ris
670efd06b09b869a763aec851623edd948bdf43e - NVIDIA: SAUCE: DT: arm_mpam: Add support for memory controller MSC on DT platforms
d42aa40fcdd1f4193e778baa065ca45e3cb5900a - NVIDIA: SAUCE: arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
ba0d0c83db123bad87c94673f89d2e5a0f11cd94 - NVIDIA: SAUCE: DT: dt-bindings: arm: Add MPAM MSC binding
2606dfcfa7662010e1ec18489992941cc0003dc8 - NVIDIA: SAUCE: arm64: kconfig: Add Kconfig entry for MPAM
bc8ffaca99358dbe9a7d508ce43fcc8454175519 - NVIDIA: SAUCE: ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
5747f3c1d3f9d32c7329c269126b87ac7819eb0a - NVIDIA: SAUCE: ACPI / PPTT: Find cache level by cache-id
cf7a7746b760d1062f52bec64e094930b22c1ef4 - NVIDIA: SAUCE: ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
41bfada281ed13bb8f70d3183448fc60172f8bce - NVIDIA: SAUCE: ACPI / PPTT: Add a helper to fill a cpumask from a processor container eacf99caadc7b0cb06c9fa2e061777632e2c8858 - NVIDIA: SAUCE: DT: cacheinfo: Expose the code to generate a cache-id from a device_node

@clsotog
Copy link
Collaborator

clsotog commented Oct 30, 2025

I started looking at the first 7 commits have the SAUCE tag but I think they are taking upstream. Then we do not need the SAUCE.

@nvmochs
Copy link
Collaborator

nvmochs commented Oct 30, 2025

I started looking at the first 7 commits have the SAUCE tag but I think they are taking upstream. Then we do not need the SAUCE.

That's a good point and something I missed.

@fyu1 - Please remove the NVIDIA:SAUCE tags from any patches that are picked from upstream.

Also, as a nit, I noticed on the patches that were picked from upstream and contain the pick tag, there is whitespace between the SHA and the closing parenths:

(cherry picked from commit d79bab8a48bfcf5495f72d10bf609478a4a3b916 )

For consistency it would be nice if this can be removed.

e.g.
(cherry picked from commit d79bab8a48bfcf5495f72d10bf609478a4a3b916)

@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras branch 2 times, most recently from fff8a68 to 42ab8ea Compare October 30, 2025 21:47
@fyu1
Copy link
Collaborator Author

fyu1 commented Oct 30, 2025

@clsotog @nvmochs Thank you very much for your review! Could you please review the branch again?

@nvmochs nvmochs self-requested a review October 30, 2025 22:19
Copy link
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my prior comments. Nothing further from me.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@clsotog
Copy link
Collaborator

clsotog commented Oct 31, 2025

This is a note to myself the kernel tree from james morse is the mpam/snaphot+extras/v6.18-rc1
Some observations but will ack anyway:
The commits that have the HACK label, have you try to use them? At the commit it says not for upstream but if it helps debug then Im ok with them.
There were 2 commits about a Makefile change and then revert that I was wondering why adding them. I guess it match the flow of morse's kernel tree.

Acked-by: Carol L Soto <csoto@nvidia.com>

@clsotog clsotog self-requested a review October 31, 2025 04:04
@fyu1
Copy link
Collaborator Author

fyu1 commented Oct 31, 2025

This is a note to myself the kernel tree from james morse is the mpam/snaphot+extras/v6.18-rc1 Some observations but will ack anyway: The commits that have the HACK label, have you try to use them? At the commit it says not for upstream but if it helps debug then Im ok with them. There were 2 commits about a Makefile change and then revert that I was wondering why adding them. I guess it match the flow of morse's kernel tree.

Acked-by: Carol L Soto <csoto@nvidia.com>

@clsotog Thank you for your review!

Only first 29 of 100+ James patches in the branch was released to LKML. The rest needs to be cleaned up to be released to LKML. So there are messy code in them. I want to keep James original patches as much as possible so it's easier to trace back to original code to help debug potential issues. I did test the backported patches on Grace machines.

@ianm-nv
Copy link
Collaborator

ianm-nv commented Oct 31, 2025

PR sent to CKT

abhsahu and others added 15 commits November 14, 2025 21:05
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md
for details regarding FFA device details for secure EC
services communication.

The HID 'MSFT000C' is reserved for FFA devices.
This HID is documented in

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md#hid-definition

This commit adds a platform driver which binds with FFA device.
In its probe routine, it executes the AVAL method to check
if FFA can be used for secure EC services communication.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 555e41e noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer
https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md
for details regarding FFA device details for secure
EC services communication.

Each secure EC service is identified by separate UUID.
When generic FFA module loads (ffa_module), then it gets the list of
partitions. Each EC service is a FFA partition and ffa_module creates
a device for each partition. These devices will be added in
arm_ffa bus type. The device will be named as arm-ffa-<number>.
For binding with these devices, a driver needs to be registered in
arm_ffa bus type. This driver uses structure ‘struct ffa_driver’ where
it uses UUID as ID table. The binding of the driver to device
happens on basis of UUID.

The secure EC services FFA driver is dependent upon main FFA
device to be created (which uses ACPI ID MSFT000C), so
ffa_driver_register()/ffa_driver_unregister() is invoked from
nvidia_ffa_probe()/nvidia_ffa_remove().

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 9613a5c noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md

for details regarding FFA device details for secure EC
services communication.

When ACPI interpreter runs code with FFH operation region offset 4,
then this data is meant for EC secure services. The FFH buffer has
data in FFA_REQ_PACKET format. In this packet, it has UUID for EC
service and then the service specific raw data. This commit adds
a custom FFH offset handler. When request comes with custom offset
then it will be handled by nvdia FFA EC driver. Inside the custom
ffh callback, it extracts the UUID and gets the ffa_device for it.
Then it fills raw data in ffa_send_direct_data2 and
invoke sync_send_receive2() routine for that ffa_device.
Once it gets the response back, then it fill data in
FFA_RESP_PACKET format and ACPI interpreter passes that data to
upper layer.

NOTE: In the above document, the FFA_REQ_PACKET and FFA_RESP_PACKET
uses different format. But in latest firmware code, the ACPI implementation
is done using same format for both request and response
(follows the FFA_REQ_PACKET format). The status bit will be updated
in the response (0 for success and 1 for failure).

This mixed endian is documented in
https://cdrdv2-public.intel.com/772722/asl-tutorial-v20190625.pdf

  In addition to Concatenate, there are several useful macros that generate
  buffers from strings. For example, the ToUUID macro takes a string of the
  form aabbccdd-eeff-gghh-iijj-kkllmmnnoopp where aa through pp represent
  one byte values encoded with hexadecimal characters. This string gets
  converted to a 16-byte buffer that looks like the following:
  Buffer()
  {
  dd, cc, bb, aa,
  ff, ee,
  hh, gg,
  ii, jj, kk, ll, mm, nn, oo, pp
  }

  This mixture of little endian and big-endian encoding UUID is called
  a mixed-endian format. The use of strings and the ToUUID macro is a
  convenient way to avoid having to manually encode the mixed-endian
  format. There are many other macros that provide similar
  conveniences, such as EISAID. In kernel, it is represented with guid_t.

Inside nvidia_ffh_handler(), we need to covert buffer of 16
bytes from FFA UUID to AML UUID format. nvidia_get_uuid_from_aml_buf()
converts the AML UUID buffer into FFA UUID format.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 40ca7bc noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

- During boot time, ACPI probe happens first. It calls _STA method for
  each added device.

- Inside _STA method for device managed by EC, it uses FFH offset 4.

- The request will fail since there is no custom handler registered
  for offset 0x4 and device will be disabled.

- If rescan happens on acpi bus, then device _STA method will be
  called again.

This commit adds support to get acpi id from UUID and
invokes acpi_bus_scan().

NOTE: nvidia_get_acpi_id_from_uuid() returns ACPI ID only
for few services. We don't have a corresponding driver available
for all the services in the current code. For few services only,
its node uses generic ACPI ID and has driver available.
For rest of the service, the driver is not yet available,
or the published spec is not updated with full ACPI sample code.
Once we have driver available for that, then we can add
those ACPI IDs in this list.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 971a25e noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

The commit 897e9e6 ("firmware: arm_ffa: Initial support for scheduler
receiver interrupt") adds support for SGI interrupts in the FFA driver.
However, the validation for SGIs in the GICv3 is too strict, causing the
driver probe to fail.

This patch relaxes the SGI validation check, allowing callers to use SGIs
if the requested SGI number is greater than or equal to MAX_IPI, which
fixes the TFA driver probe failure.

This issue is observed on NVIDIA server platform with FFA-v1.1.

 PTP clock support registered
 EDAC MC: Ver: 3.0.0
 ARM FF-A: Driver version 1.1
 ARM FF-A: Firmware version 1.1 found
 GICv3: [Firmware Bug]: Illegal GSI8 translation request
 ARM FF-A: Failed to create IRQ mapping!
 ARM FF-A: Notification setup failed -61, not enabled
 ARM FF-A: Failed to register driver sched callback -95
 scmi_core: SCMI protocol bus registered

This patch was sent in arm mailing list for upstream but it got
rejected.

https://patchwork.kernel.org/project/linux-arm-kernel/patch/20240813033925.925947-1-sdonthineni@nvidia.com/

The proper fix requires some kind of mechanism by which a
SGI can be requested by module but that needs discussion with arm and
it will take time. This patch will break only if MAX_IPI value gets
changed. This patch adds a BUILD_BUG_ON() to catch that situation.
Once proper solution is concluded then this patch will be reverted.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>
(backported from commit fd136cf)
[maskedarray: removed enum ipi_msg_type definition as it appears in
upstream commit "irqchip/gic-v5: Add GICv5 LPI/IPI support"]
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer
https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md
for details regarding FFA device details for secure
EC services communication.

1. We need to get virtual IDs which a EC service supports.
   In the FFA node, the _DSD object contains this information.
   If we look the sample from above document,

  Name(_DSD, Package() {
      ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), //Device Prop UUID
      Package() {
        Package(2) {
          "arm-arml0002-ffa-ntf-bind",
          Package() {
              1, // Revision
              2, // Count of following packages
              Package () {
                     ToUUID("330c1273-fde5-4757-9819-5b6539037502"), // Service1 UUID
                     Package () {
                          0x01,     //Cookie1 (UINT32)
                          0x07,     //Cookie2
                      }
              },
              Package () {
                     ToUUID("b510b3a3-59f6-4054-ba7a-ff2eb1eac765"), // Service2 UUID
                     Package () {
                          0x01,     //Cookie1
                          0x03,     //Cookie2
                      }
             }
         }
      }
    }
  }) // _DSD()

  Then it uses a nexted package structure.
  nvidia_ffa_fill_notification_map() added in this commit parses the _DSD
  object and fill the notification id map for that service.

2. Once the virtual ID is get then it needs to map to
   physical ID by invoking function 1 in the notify service.

3. The UUID for notification service is
   B510B3A3-59F6-4054-BA7A-FF2EB1EAC765.
   An FFA device will be created for this notification service
   by ffa_module. This notify service needs to be probed first.
   To make that happen, a separate ffa_driver instance is created
   and it is getting registered first.

4. We can do 1:1 mapping between virtual ID and hardware ID.

5. We need to invoke notify_request() with hardware notification ID.
   It registers callback function for notification.

6. Once notification comes then we need to evaluate _DSM method
   with virtual ID (which will be mapped same as hardware ID).

7. The function 2 in the notify service should destroy the mapping.
   But it is nither implemented in the firmware not its documentation
   is available. A TODO comment is added in
   nvidia_ffa_notification_destroy().

   Also, if we unload and reload the modules, the existing mapping
   still exists. In nvidia_ffa_notification_setup(), ignore the error
   for this case. When firmware is updated, then the error will be
   returned.

8. The notification service FFA device is needed by each EC secure
   services FFA device to get virtual notification list. Now following
   device dependency chain is created.

    FFA device <-  notification service FFA device <- EC secure services FFA device

    To satisfy this, call driver registration in its dependent driver probe routine.
    Similarly, do the driver registration in its dependent driver removed routine.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 1287a1d noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
… EC driver

BugLink: https://bugs.launchpad.net/bugs/2114230

The NVIDIA FFA and EC secure services driver enables the communication
with EC (Embedded Controller). Make this driver built-in to enable EC
communication at early boot.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 9ea0251)

(cherry picked from commit 9ea0251 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114759

Add quirk function to skip pcie secondary bus reset. PCIe gen4 link
will downgrade to gen1 after SBR, so we have to skip this operation.

Signed-off-by: Jerry.Guo <jerry.guo@mediatek.com>
Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 0185574 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
…pinctrl driver

BugLink: https://bugs.launchpad.net/bugs/2117784

Kernel GPIO subsystem mapping hardware pin number to a different
range of gpio number. Add gpio-range structure to hold
the mapped gpio range in pinctrl driver. That enables the kernel
to search a range of mapped gpio range against a pinctrl device.

Signed-off-by: Jonas Chen <yung-chi.chen@mediatek.com>
Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 1049985 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2117784

Add acpi support in the shared part of pinctrl driver. Parsing
hardware base addresses and irq naumber to initialize eint
accroding to the acpi table data.

Signed-off-by: Jonas Chen <yung-chi.chen@mediatek.com>
Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>
(backported from commit cdce65d noble:linux-nvidia-6.14)
[maskedarray: context adjusted due to commit 86dee87: "pinctrl:
mediatek: Fix the invalid conditions"]
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2117784

Add mt8901 pinctrl, gpio and eint driver implementation.

Signed-off-by: Jonas Chen <yung-chi.chen@mediatek.com>
Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>
(backported from commit 1fc7a58 noble:linux-nvidia-6.14)
[maskedarray: context adjusted for missing commit a3fe132: "pinctrl:
mediatek: Add pinctrl driver for mt8189"]
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
…CTRL_MT8901

BugLink: https://bugs.launchpad.net/bugs/2117784

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 0bd85d0)

(cherry picked from commit 0bd85d0 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2118357

commit d0038ee ("NVIDIA: SAUCE: Add support for EC
secure service communication") added nvidia_ffh_handler()
function. While copying the data back into ACPI FFH packet,
it uses the request length. The response data can be larger
than request length. The response length can't be fetched in the
linux FFH handler function. We can copy all the bytes from
ffa_data.data. The ACPI AML code will only use the required
number bytes from this.

Normally we don't need response length to be known.
The ACPI table are not using that. It is parsing response
data directly. In the latest revision of spec, the length
field itself has been removed

https://github.com/OpenDevicePartnership/documentation/blob/b23acb09f7cf03a5c3167509533f396d547e6291/guide_book/src/specs/ec_interface/secure-ec-services-overview.md#operation-region-definition

For DIGITS GB10, it is using older revision of spec and the launch is
planned with older revision of spec. When we move to latest revision,
then we need to copy all data bytes for both request and response.

The info->length is corresponding to FFH buffer length in ACPI table.
Following is the code in ACPI table

  Name (_HID, "MSFT000C")  // _HID: Hardware ID
  OperationRegion (AFFH, FFixedHW, 0x04, 0x90)

info->length will be 0x90 (144) bytes.
ffa_packet->length in the older revision is valid data bytes
(https://github.com/OpenDevicePartnership/documentation/blob/45ad9b30be0f40e229deed2fef7a60d0b0b591f5/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md)

struct nvidia_ec_ffa_packet *ffa_packet = (struct nvidia_ec_ffa_packet *)value;

This value buffer length should be info->length.
We are taking minimum of sizeof(ffa_data.data) = 112 and
(info->length = 144) - (offsetof(struct nvidia_ec_ffa_packet, rawdata) = 18) = 126,
so ffh_copy_len will be 112 for the current DIGITS ACPI implementation.

In the latest revision, this length mismatch is also fixed. Raw data will
start at offset 32, so there both will come as 112.

Fixes: d0038ee ("NVIDIA: SAUCE: Add support for EC secure service communication")
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 141bd56 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2118663

Add cpu part and model macro definitions for NVIDIA Olympus core.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 9273361 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2118663

Set CONFIG_ARM64_BRBE=y for arm64 linux-nvidia-6.14.

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 26a417a noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
James Morse and others added 20 commits November 20, 2025 12:43
Resctrl specifies the schema format for MB and SMBA in rdt_resources_all[].
Intel platforms take a percentage for MB, AMD platforms take an absolute
value which isn't MB/s. Currently these are both treated as a 'range'.
Adding support for additional types of control shows that user-space
needs to be told what the control formats are. Today users of resctrl
must already know if their platform is Intel or AMD to know how the
MB resource will behave.
The MPAM support exposes new control types that take a 'percentage'.
The Intel MB resource is also configured by a percentage, so should be
able to expose this to user-space.
Remove the static configuration for schema_fmt in rdt_resources_all[]
and specify it with the other control properties in
__get_mem_config_intel() or __get_mem_config_amd().

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 20f0c13f4ffd01cb6fc239248afa05d602f9e8d4 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
MPAMs bandwidth controls are both exposed to resctrl as if they take a
percentage. Update the schema format so that user-space can be told this
is a perentage, and files that describe this control format are exposed.
(e.g. min_percent)
Existing variation in this area is covered by requiring user-space to
know if it is running on an Intel or AMD platform. Exposing the schema
format directly will avoid modifying user-space to know it is running
on an MPAM or RISCV platform.
MPAM can also expose bitmap controls for memory bandwidth, which may
become important for use-cases in the future. These are currently converted
to a percentage to fit the existing definition of the MB resource.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit ea03ef359eb04c8c0f557f589578bb4777b8e2b5 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Resctrl previously had a 'range' schema format that took some kind of
number. This has since been split into percentage, MB/s and an AMD
platform specific scheme.
As range is no longer used, remove it.
The last user is mba_sc which should be described as taking MB/s.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 93fda1d6632174fefddfe5e712110dd1e2947c95 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…tmap controls

MPAM has cache capacity controls that effectively take a percentage.
Resctrl supports percentages, but the collection of files that are
exposed to describe this control belong to the MB resource.
To find the minimum granularity of the percentage cache capacity controls,
user-space is expected to rad the banwdidth_gran file, and know this has
nothing to do with bandwidth.
The only problem here is the name of the file. Add duplicates of these
properties with percentage and bitmap in the name. These will be exposed
based on the schema format.
The existing files must remain tied to the specific resources so that
they remain visible to user-space. Using the same helpers ensures the
values will always be the same regardless of the file used.
These files are not exposed until the new RFTYPE schema flags are
set on a resource 'fflags'.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 673bcb00d2371a2876e164da55d642fdf7657b8d https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…n schema format

MPAM has cache capacity controls that effectively take a percentage.
Resctrl supports percentages, but the collection of files that are
exposed to describe this control belong to the MB resource. New files
have been added that are selected based on the schema format.
Apply the flags to enable these files based on the schema format.
Add a new fflags_from_schema() that is used for controls.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit a837ccc258380d6aeef86df709cc0484b60a4acf https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
If more schemas are added to resctrl, user-space needs to know how to
configure them. To allow user-space to configure schema it doesn't know
about, it would be helpful to tell user-space the format, e.g. percentage.
Add a file under info that describes the schema format.
Percentages and 'mbps' are implicitly decimal, bitmaps are expected to be
in hex.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit b457019d995b2849e683aef0fd89066e64c679a4 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
MPAM can have both cache portion and cache capacity controls on any cache
that supports MPAM. Cache portion bitmaps can be exposed via resctrl if
they are implemented on L2 or L3.
The cache capacity controls can not be used to isolate portions, which is
in implicit in the L2 or L3 bitmap provided by user-space. These controls
need to be configured with something more like a percentage.
Add the resource enum entries for these two resources. No additional
resctrl code is needed because the architecture code will specify this
resource takes a 'percentage', re-using the support previously used only
for the MB resource.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit b601bbf375b016c417db4ec0e8bd6ae58b9057aa https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…m cmax

MPAM's maximum cache-capacity controls take a fixed point fraction format.
Instead of dumping this on user-space, convert it to a percentage.
User-space using resctrl already knows how to handle percentages.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 183d4c43260089e6b51518e50427d0f04a6af875 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
The cpu hotplug lock has a helper lockdep_assert_cpus_held() that makes it
easy to annotate functions that must be called with the cpu hotplug lock
held.
Do the same for memory.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit f40d4b8451b3d9e197166ff33104bd63f93709d0 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…PU hotplug lock

resctrl takes the read side CPU hotplug lock whenever it is working
with the list of domains. This prevents a CPU being brought online
and the list being modified while resctrl is walking the list, or
picking CPUs from the CPU masks.
If resctrl domains for CPU-less NUMA nodes are to be supported, this
would not be enough to prevent the domain list form being modified as
a NUMA node can come online with only memory.
Take the memory hotplug lock whenever the CPU hotplug lock is taken.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit f5a082989a5f40b9b95515d68b230f8125648fdb https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…arch stubs

Resctrl expects the domain IDs for the 'MB' resource to be the
corresponding L3 cache-ids.
This is a problem for platforms where the memory bandwidth controls
are implemented somewhere other than the L3 cache, and exist on a
platform with CPU-less NUMA nodes.
Such platforms can't currently be exposed via resctrl as not all
the memory bandwidth can be controlled.
Add a mount option to allow user-space to opt-in to the domain IDs
for the MB resource to be the NUMA nid instead.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit ae8929caac02dccdc932666c1d8c906dda541bf1 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
idx is not used. Remove it to avoid build warning.

The author is James but he doesn't add his Signed-off-by.

(backported from commit c9b4fabe0b1b4805186d4326d47547993a02d191 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
[fenghuay: Change subject to a meaningfull one. Add commit message.]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…stead of cache-id

The MB domain ids are the L3 cache-id. This is unfortunate if the
memory bandwidth controls are implemented for CPU-less NUMA nodes as
there is no L3 whose cache-id can be used to expose these controls
to resctrl.
When picking the class to use as MB, note whether it is possible
for the NUMA nid to be used as the domain-id. By default the MB
resource will use the cache-id.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit c2506e7fdb9e9de624af635f5060a1fe56a6bb80 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
… work with a set of CPUs

mpam_resctrl_offline_domain_hdr() expects to take a single CPU that is
going offline. Once all CPUs are offline, the domain header is removed
from its parent list, and the structure can be freed.
This doesn't work for NUMA nodes.
Change the CPU passed to mpam_resctrl_offline_domain_hdr() and
mpam_resctrl_domain_hdr_init to be a cpumask. This allows a single CPU
to be passed for CPUs going offline, and cpu_possible_mask to be passed
for a NUMA node going offline.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 093483e5bca0aef546208b32eedf59f3aac665ff https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…domain() to have CPU and node

mpam_resctrl_alloc_domain() brings a domain with CPUs online. To allow
for domains that don't have any CPUs, split it into a CPU and NUMA node
version.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 817d04bd296871b61dd70f68d160b85837dfe9a8 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…nline/offline

To expose resctrl resources that contain CPU-less NUMA domains, resctrl
needs to be told when a CPU-less NUMA domain comes online. This can't
be done with the cpuhp callbacks.
Add a memory hotplug notifier, and use this to create and destroy
resctrl domains.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit caf4034229d8df2c306658c2ddbe3c1ab73df109 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…UMA nid as MB domain-id

Enable resctrl's use of NUMA nid as the domain-id for the MB resource.
Changing this state involves changing the IDs of all the domains
visible to resctrl. Writing to this list means preventing CPU and memory
hotplug.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit a795ac909c6c050daaf095abc9043217ddf5e746 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Modified for latest MPAM.

Signed-off-by: Brad Figg <bfigg@nvidia.com>
Signed-off-by: Koba Ko <kobak@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
(forward ported from commit 77bd02c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-6.14-next)
[fenghuay: change 6.14 path to 6.17]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Koba Ko <kobak@nvidia.com>
Define the missing SHIFT definitions to fix build errors.

Fixes: a76ea20 ("NVIDIA: SAUCE: arm_mpam: Add quirk framework")
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
partid is from 0 to partid_max, inclusively.
partid_max + 1 is out of valid partid range. Accessing partid_max + 1
will generate error interrupt and cause MPAM disabled.

Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras branch from 42ab8ea to 6b47273 Compare November 20, 2025 13:30
@fyu1
Copy link
Collaborator Author

fyu1 commented Nov 20, 2025

Based on feedbacks from Canonical, I update the MPAM branches in the PR branch to remove HACK patches, remove one untested patch, modify subjects and commit message for some patches. Please see details below.

To make the PR work, I need to backport two resctrl patch sets (Babu's and Tony's) from 6.17 upstream before I can backport MPAM patches. No changes on these patches.

The MPAM patches are backported from https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshort+extras/v6.18-rc1. The patches are formatted from commit range 2af39084438cebc0053e8ddcc4a855873125b518^..HEAD. There are totally 140 patches.

To clean up the 140 MPAM patches and retain as much of the existing code as possible, I made the following changes:

  1. Remove these two patches because 0068 reverts 0044:
    0044-DROP-Makefile-fixup.patch
    0068-DROP-Revert-Makefile-fixup.patch

  2. Remove this patch because it doesn't do any useful things:
    0090-TAG-extras-branch-here.patch

  3. Change subjects to meaningful ones. Add commit messages:
    0011-DT-code-for-PREV.patch
    0133-DISAPPEAR.patch:

  4. These "HACK" patches are remove safely:
    0093-HACK-make-quirks-writable.patch
    0139-HACK-fs-resctrl-Add-cranky-debug-for-reading-CPU-msr.patch
    0140-HACK-arm_mpam-Add-cranky-debug-for-reading-CSU-hardw.patch

  5. This untested patch is removed. So MPAM KVM won't work. But it's safer to remove this patch than keeping it:
    0041-untested-KVM-arm64-Force-guest-EL1-to-use-user-space.patch

  6. These untested patches are kept. Changing or removing them may cause conflicts or other issues. They only change MPAM driver code.
    If there is any issue in these patches (or any MPAM patches), a workaround is to disable MPAM driver completely by kernel boot option: "arm64.nompam".
    0058-untested-arm_mpam-resctrl-pick-classes-for-use-as-mb.patch
    0066-untested-arm_mpam-resctrl-Allow-monitors-to-be-confi.patch
    0108-untested-mpam-Convert-pcc_channels-list-to-XArray-an.patch
    0136-untested-arm_mpam-resctrl-Split-mpam_resctrl_alloc_d.patch
    0138-untested-arm_mpam-resctrl-Allow-resctrl-to-enable-NU.patch

@clsotog
Copy link
Collaborator

clsotog commented Nov 20, 2025

@fyu1 with the new commits, would that fix the mount/unmount issue Colin saw last time? I saw it again today.

@nvmochs
Copy link
Collaborator

nvmochs commented Nov 20, 2025

I re-reviewed the latest updates to the PR (0dfe3c7b2d47^..6b47273c9904) and confirmed 165 out of 170 match their source exactly. I manually reviewed the remaining 5 patches and have no issues with them; I also see no issues with the strategy proposed here to address issues with the patches that were previously flagged.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@clsotog clsotog self-requested a review November 20, 2025 21:27
@clsotog
Copy link
Collaborator

clsotog commented Nov 20, 2025

@fyu1 with the new commits, would that fix the mount/unmount issue Colin saw last time? I saw it again today.
Sorry for the noise. I booted to wrong kernel but with latest changes do not see the issue. Thanks.

Copy link
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

@nvidia-bfigg nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch 2 times, most recently from c7fca69 to 6a9a932 Compare December 18, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.