-
Notifications
You must be signed in to change notification settings - Fork 59
sof-kernel-log-check.sh: Ignore more logs #1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch?
|
7ded2c2 to
f1f30a2
Compare
tools/sof-kernel-log-check.sh
Outdated
| # ignore the ACPI error on LNL and PTL. | ||
| # kernel: ACPI: \: Can't tag data node | ||
| ignore_str="$ignore_str""|kernel: ACPI: \\\\: Can't tag data node" | ||
| ignore_str="$ignore_str""|kernel: xe 0000:00:02.0: \[drm\] \*ERROR\* Tile0: GT1: Timed out wait for G2H, fence 669, action 5503, done no" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xe dev here, these specific numbers are seen frequently (669 and 5503)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ask because outside of very specific circumstances (probably only module load) these shouldn't be repeatable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all but many of the other GPU and display errors in this file are seen only once at boot. So maybe this one too, which makes it repeatable? Dunno.
A good suspend/resume pass rate is always achieved last (see #1038 + internal sof-framework 408 and others) and by that time other components tend to be more reliable and less noisy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking and talking with my colleagues, maybe this could be reproducible. How often are you seeing this come up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msatwood This problem appeared on PTL during sof v2.14 validation on kernel revision b250b5425a17. We haven't encountered these specific values anywhere else, but we did notice a similar error on WCL:
[70727.555869] kernel: xe 0000:00:02.0: [drm] Tile0: GT1: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
[70727.555875] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: PF: Failed to push self configuration (-ECANCELED)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed an error from today's run and saw another instance of this problem on PTL:
[ 3595.491532] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: Timed out wait for G2H, fence 3275, action 5503, done no
[ 3595.491620] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: PF: Failed to push self configuration (-ETIME)
This time the numbers are different. I should probably change the line so that it matches all numbers in this message.
Ignore further errors unrelated to SOF. Signed-off-by: Pawel Langowski <pawelx.langowski@intel.com>
f1f30a2 to
002e8a2
Compare
|
This resolve failing SOF tests, this error message is not directly connected with fw, BUT it should be considered as temporary solution. We need to plan how to handle and report these errors in future. |
Ignore further errors unrelated to SOF.