Update aiebu submodule and print opcode information in context health report on ERT_CMD_STATE_TIMEOUT (for AIE4 and AIE2PS)#9876
Open
sayyanna wants to merge 6 commits into
Open
Conversation
Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
stsoe
reviewed
Jun 17, 2026
Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
sonals
reviewed
Jun 18, 2026
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
stsoe
requested changes
Jun 18, 2026
Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
stsoe
requested changes
Jun 26, 2026
sonals
reviewed
Jun 27, 2026
| #include "core/common/runner/capture.h" | ||
| #include "core/common/xdp/profile.h" | ||
|
|
||
| #include "core/common/aiebu/src/cpp/include/aiebu/aiebu_debug.h" |
Member
There was a problem hiding this comment.
Isn't "aiebu/aiebu_debug.h" enough?
Collaborator
There was a problem hiding this comment.
Isn't "aiebu/aiebu_debug.h" enough?
We could add core/common/aiebu/src/cpp/include to include search path for specific targerts. That would allow #include "aiebu/aiebu_debug.h" We should then make sure dtrace has its exported header files in same folder.
Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
Co-authored-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem solved by the commit
Print opcode information in context health report on ERT_CMD_STATE_TIMEOUT (for AIE4 and AIE2PS)#### Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered
How problem was solved, alternative solutions (if any) and why they were rejected
A new API is added in AIEBU which decodes an opcode from ucindex, page_index and offset.
As part of this PR, updated aiebu submodule of XRT to get the newly added API and added code in XRT to link to aiebu so that XRT calls this new API and prints decoded opcode in the event ERT_CMD_STATE_TIMEOUT
Risks (if any) associated the changes in the commit
None
What has been tested and how, request additional testing if necessary
runlist failed execution (ERT_CMD_STATE_TIMEOUT)
Kernel Instance: mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128
ELF File: C:\Users\sayyanna\Downloads\mcdm_stack_05_01\mcdm_stack/../data_bin/diff_pm/mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128.elf
ELF UUID: 784303af-f0c5-aa13-fb02-2bf0ef0ff038
ctx_state = 0x00000001 (CTX_STATUS_ERROR)
ctx_error_type = 0x00000005 (NPU_ASYNC_EVENT_CTX_ERR_UC_CRITICAL_ERROR)
number of uC reported = 6
uc_info[0]:
uc_idx=0x00000000
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000002 (CTRL_HANG)
fw_state=0x00000019 (FW_STATE_CTRL_CODE_HANG)
page_idx=0x00000002
offset=0x00000084
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000
Opcode: mask_poll_32 0x2119630, 1, 1
Opcode Size: 0x10
Opcode diag: section=.ctrltext.0.2.0 size=256 pos=132
uc_info[1]:
uc_idx=0x00000001
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000000 (NONE)
fw_state=0x00002001 (FW_STATE_WORKER_PRE_WORK_BARRIER)
page_idx=0x00000000
offset=0x00000000
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000
Opcode decode failed
Opcode diag: section not found: .ctrltext.1.0.0
From .dump section:
runlist failed execution (ERT_CMD_STATE_TIMEOUT)
Kernel Instance: mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128
ELF File: C:\Users\sayyanna\Downloads\mcdm_stack_05_01\mcdm_stack/../data_bin/diff_pm/mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128.elf
ELF UUID: 784303af-f0c5-aa13-fb02-2bf0ef0ff038
ctx_state = 0x00000001 (CTX_STATUS_ERROR)
ctx_error_type = 0x00000005 (NPU_ASYNC_EVENT_CTX_ERR_UC_CRITICAL_ERROR)
number of uC reported = 6
uc_info[0]:
uc_idx=0x00000000
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000002 (CTRL_HANG)
fw_state=0x00000019 (FW_STATE_CTRL_CODE_HANG)
page_idx=0x00000002
offset=0x00000084
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000
Opcode: MASK_POLL_32 0x2119630, 0x1, 0x1
Opcode Size: 0x10
Line: 28
File: ../ml_asm/aie_runtime_control.asm
uc_info[1]:
uc_idx=0x00000001
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000000 (NONE)
fw_state=0x00002001 (FW_STATE_WORKER_PRE_WORK_BARRIER)
page_idx=0x00000000
offset=0x00000000
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000
Opcode decode failed
Opcode diag: section not found: .ctrltext.1.0.0
Documentation impact (if any)