Skip to content

Conversation

@PLangowski
Copy link
Contributor

@PLangowski PLangowski commented Oct 30, 2025

Ignore further errors unrelated to SOF.

@sofci
Copy link
Collaborator

sofci commented Oct 30, 2025

Can one of the admins verify this patch?

reply test this please to run this test once

@PLangowski PLangowski force-pushed the kernel-boot-ignore branch 2 times, most recently from 7ded2c2 to f1f30a2 Compare October 30, 2025 14:31
@PLangowski PLangowski changed the title [DNM] Ignore more logs (test) Ignore more logs Oct 30, 2025
@PLangowski PLangowski changed the title Ignore more logs sof-kernel-log-check.sh: Ignore more logs Oct 30, 2025
# ignore the ACPI error on LNL and PTL.
# kernel: ACPI: \: Can't tag data node
ignore_str="$ignore_str""|kernel: ACPI: \\\\: Can't tag data node"
ignore_str="$ignore_str""|kernel: xe 0000:00:02.0: \[drm\] \*ERROR\* Tile0: GT1: Timed out wait for G2H, fence 669, action 5503, done no"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xe dev here, these specific numbers are seen frequently (669 and 5503)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because outside of very specific circumstances (probably only module load) these shouldn't be repeatable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all but many of the other GPU and display errors in this file are seen only once at boot. So maybe this one too, which makes it repeatable? Dunno.

A good suspend/resume pass rate is always achieved last (see #1038 + internal sof-framework 408 and others) and by that time other components tend to be more reliable and less noisy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking and talking with my colleagues, maybe this could be reproducible. How often are you seeing this come up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msatwood This problem appeared on PTL during sof v2.14 validation on kernel revision b250b5425a17. We haven't encountered these specific values anywhere else, but we did notice a similar error on WCL:

[70727.555869] kernel: xe 0000:00:02.0: [drm] Tile0: GT1: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
[70727.555875] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: PF: Failed to push self configuration (-ECANCELED)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed an error from today's run and saw another instance of this problem on PTL:

[ 3595.491532] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: Timed out wait for G2H, fence 3275, action 5503, done no
[ 3595.491620] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: PF: Failed to push self configuration (-ETIME)

This time the numbers are different. I should probably change the line so that it matches all numbers in this message.

Ignore further errors unrelated to SOF.

Signed-off-by: Pawel Langowski <pawelx.langowski@intel.com>
@majunkier
Copy link
Contributor

This resolve failing SOF tests, this error message is not directly connected with fw, BUT it should be considered as temporary solution. We need to plan how to handle and report these errors in future.

@redzynix redzynix merged commit d651139 into thesofproject:main Nov 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants