Skip to content

Update aiebu submodule and print opcode information in context health report on ERT_CMD_STATE_TIMEOUT (for AIE4 and AIE2PS)#9876

Open
sayyanna wants to merge 6 commits into
Xilinx:masterfrom
sayyanna:print_opcode
Open

Update aiebu submodule and print opcode information in context health report on ERT_CMD_STATE_TIMEOUT (for AIE4 and AIE2PS)#9876
sayyanna wants to merge 6 commits into
Xilinx:masterfrom
sayyanna:print_opcode

Conversation

@sayyanna

Copy link
Copy Markdown
Collaborator

Problem solved by the commit

Print opcode information in context health report on ERT_CMD_STATE_TIMEOUT (for AIE4 and AIE2PS)#### Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

How problem was solved, alternative solutions (if any) and why they were rejected

A new API is added in AIEBU which decodes an opcode from ucindex, page_index and offset.
As part of this PR, updated aiebu submodule of XRT to get the newly added API and added code in XRT to link to aiebu so that XRT calls this new API and prints decoded opcode in the event ERT_CMD_STATE_TIMEOUT

Risks (if any) associated the changes in the commit

None

What has been tested and how, request additional testing if necessary

runlist failed execution (ERT_CMD_STATE_TIMEOUT)
Kernel Instance: mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128
ELF File: C:\Users\sayyanna\Downloads\mcdm_stack_05_01\mcdm_stack/../data_bin/diff_pm/mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128.elf
ELF UUID: 784303af-f0c5-aa13-fb02-2bf0ef0ff038
ctx_state = 0x00000001 (CTX_STATUS_ERROR)
ctx_error_type = 0x00000005 (NPU_ASYNC_EVENT_CTX_ERR_UC_CRITICAL_ERROR)
number of uC reported = 6
uc_info[0]:
uc_idx=0x00000000
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000002 (CTRL_HANG)
fw_state=0x00000019 (FW_STATE_CTRL_CODE_HANG)
page_idx=0x00000002
offset=0x00000084
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000

Opcode: mask_poll_32 0x2119630, 1, 1
Opcode Size: 0x10
Opcode diag: section=.ctrltext.0.2.0 size=256 pos=132

uc_info[1]:
uc_idx=0x00000001
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000000 (NONE)
fw_state=0x00002001 (FW_STATE_WORKER_PRE_WORK_BARRIER)
page_idx=0x00000000
offset=0x00000000
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000

Opcode decode failed
Opcode diag: section not found: .ctrltext.1.0.0

From .dump section:
runlist failed execution (ERT_CMD_STATE_TIMEOUT)
Kernel Instance: mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128
ELF File: C:\Users\sayyanna\Downloads\mcdm_stack_05_01\mcdm_stack/../data_bin/diff_pm/mladfmatmulbias_mladf_aie4_v1_abfp16wbfp16acc16f_128_2560_2048_128.elf
ELF UUID: 784303af-f0c5-aa13-fb02-2bf0ef0ff038
ctx_state = 0x00000001 (CTX_STATUS_ERROR)
ctx_error_type = 0x00000005 (NPU_ASYNC_EVENT_CTX_ERR_UC_CRITICAL_ERROR)
number of uC reported = 6
uc_info[0]:
uc_idx=0x00000000
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000002 (CTRL_HANG)
fw_state=0x00000019 (FW_STATE_CTRL_CODE_HANG)
page_idx=0x00000002
offset=0x00000084
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000

Opcode: MASK_POLL_32 0x2119630, 0x1, 0x1
Opcode Size: 0x10
Line: 28
File: ../ml_asm/aie_runtime_control.asm

uc_info[1]:
uc_idx=0x00000001
uc_idle_status=0x00000000 (NONE)
misc_status=0x00000000 (NONE)
fw_state=0x00002001 (FW_STATE_WORKER_PRE_WORK_BARRIER)
page_idx=0x00000000
offset=0x00000000
restore_page=0x00000000
restore_offset=0x00000000
uc_ear=0x00000000
uc_esr=0x00000000

Opcode decode failed
Opcode diag: section not found: .ctrltext.1.0.0

Documentation impact (if any)

Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp Outdated
Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp Outdated
@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@sayyanna sayyanna requested a review from stsoe June 18, 2026 16:31
Comment thread src/runtime_src/core/common/api/CMakeLists.txt Outdated
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp Outdated
Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
@sayyanna sayyanna requested a review from stsoe June 18, 2026 21:12
@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
@sayyanna sayyanna requested a review from sonals June 19, 2026 00:14
@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp Outdated
Comment thread src/runtime_src/core/common/api/CMakeLists.txt Outdated
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp Outdated
Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp Outdated
#include "core/common/runner/capture.h"
#include "core/common/xdp/profile.h"

#include "core/common/aiebu/src/cpp/include/aiebu/aiebu_debug.h"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "aiebu/aiebu_debug.h" enough?

@stsoe stsoe Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "aiebu/aiebu_debug.h" enough?

We could add core/common/aiebu/src/cpp/include to include search path for specific targerts. That would allow #include "aiebu/aiebu_debug.h" We should then make sure dtrace has its exported header files in same folder.

@sonals sonals left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address comments from @stsoe

Signed-off-by: Sri Latha Ayyannagari <SriLatha.Ayyannagari@amd.com>
@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@stsoe stsoe left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Comment thread src/runtime_src/core/common/api/xrt_kernel.cpp
Co-authored-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants