DAOS-17321 ddb: Add checksum dump function to ddb vos API by knard38 · Pull Request #18293 · daos-stack/daos

knard38 · 2026-05-19T19:42:24Z

Description

This PR is the first in a series of four addressing DAOS-17321, which adds checksum management commands (csum_dump, csum_check, csum_resync) to the DAOS Debugger (ddb).

It adds a new VOS fetch flag VOS_OF_FETCH_CSUM to vos_fetch_begin(). When set, the flag allows a caller to retrieve per-extent checksum metadata stored in VOS without reading or transferring the actual data payload. This
is the prerequisite for the subsequent ddb patches that implement csum_dump.

Behavior of `VOS_OF_FETCH_CSUM`

Skips SGL initialization and all data I/O (analogous to VOS_OF_FETCH_SIZE_ONLY).
Single value (SV): saves the checksum info via save_csum() then returns early; iod_size is intentionally not updated on a csum-only fetch.
Array value: saves the full stored extent (en_ext) paired with its epoch via save_recx(), then saves the corresponding checksum via save_csum(). The full stored extent (not the selected sub-range) is used to preserve the checksum chunk-boundary alignment.
Mutually exclusive with VOS_OF_FETCH_RECX_LIST: both flags write ic_recx_lists with incompatible semantics; combining them returns -DER_INVAL.

Commits

Minor test and formatting fixes — clang-format alignment of the VOS_OF_* enum, missing \n in two D_PRINT() calls, d_sgl_fini() resource-leak fix after a failed update, and copyright bumps.
VOS_OF_FETCH_CSUM feature — new flag, VOS I/O skip paths, flag validation guard, and unit tests.

New tests

VOS400.1 io_csum_fetch_single: writes a single value with a CRC64 checksum, fetches it with VOS_OF_FETCH_CSUM, and verifies the retrieved dcs_csum_info matches the stored one.
VOS401.1 io_csum_fetch_recx: writes two overlapping array extents at different epochs with CRC16 checksums, then verifies that both extents and their checksums (with correct chunk counts) are returned.

Related PRs (in order)

This PR — VOS layer: VOS_OF_FETCH_CSUM flag
#17365 — ddb VOS API: ddb_vos_csum_dump()
#17423 — ddb C API: csum_dump command
#17474 — ddb Go layer: csum_dump CLI binding

Steps for the author:

Commit message follows the guidelines.
Appropriate Features or Test-tag pragmas were used.
Appropriate Functional Test Stages were run.
At least two positive code reviews including at least one code owner from each category referenced in the PR.
Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

Gatekeeper requested (daos-gatekeeper added as a reviewer).

Fix resource leak in io_sgl_fetch(): move d_sgl_fini() before the error check so the SGL is freed regardless of vos_obj_update() return code. Fix missing newline in two D_PRINT() calls in vos_tests.c. Apply clang-format alignment to VOS_OF_* enum in vos_types.h to fix pre-existing clang-format non-conformance. Reformat io_test_init_csummer() in vts_io.c to match clang-format style. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

Add VOS_OF_FETCH_CSUM (1 << 21) flag to vos_fetch_begin() allowing callers to retrieve per-extent checksum metadata without fetching the actual data. This is intended for the ddb (DAOS Debugger) tool which needs to inspect and manage checksums stored in VOS. When VOS_OF_FETCH_CSUM is set: - SGL/bio-buffer allocation is skipped (like VOS_OF_FETCH_SIZE_ONLY). - For single-value records: save_csum() is called, then the data fetch is skipped. iod_size is not updated. - For array records: save_csum() is called as normal, and the full stored extent bounds (en_ext) plus epoch are saved via save_recx() so that callers get the raw per-version extent list paired with their checksums via vos_ioh2recx_list() and vos_ioh2ci(). VOS_OF_FETCH_CSUM and VOS_OF_FETCH_RECX_LIST are mutually exclusive since both write ic_recx_lists with incompatible semantics. An explicit -DER_INVAL check is added in vos_ioc_create() (in addition to the existing D_ASSERT in the hot path). Add unit tests for the new flag: - VOS400.1: Fetch checksum of a single-value record (CRC64). - VOS401.1: Fetch checksums of overlapping array extents (CRC16). Features: recovery Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

github-actions · 2026-05-19T19:58:10Z

Ticket title is 'Checksum management with ddb'
Status is 'Blocked'
https://daosio.atlassian.net/browse/DAOS-17321

daosbuild3 · 2026-05-20T20:47:31Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/1/execution/node/1116/log

NiuYawei · 2026-05-21T01:32:51Z

 	ioc->ic_remove = ((vos_flags & VOS_OF_REMOVE) != 0);
 	ioc->ic_ec = ((vos_flags & VOS_OF_EC) != 0);
 	ioc->ic_rebuild    = ((vos_flags & VOS_OF_REBUILD) != 0);
+	ioc->ic_csum_fetch = ((vos_flags & VOS_OF_FETCH_CSUM) != 0);


Is there a specific use case for fetching only the csum without data? Since current fetch API returns both, can we just use the existing API and extract the csum from the response?

From my investigation on this ticket, the standard fetch API is not suitable for ddb's csum_dump use case for the following reasons:

A normal fetch fails on corrupted records — exactly when ddb needs to inspect checksums. When the VOS scrubber detects a checksum mismatch, it marks the record address with BIO_ADDR_IS_CORRUPTED. On subsequent normal fetches, VOS returns -DER_CSUM immediately (vos_io.c:972-974 for SV, :1099-1103 for array), and the client retries on an alternative shard. Since ddb operates on a single local VOS pool (no redundancy), a normal fetch would simply fail with -DER_CSUM, making it impossible to retrieve the stored checksum metadata. VOS_OF_FETCH_CSUM skips the corruption check and data I/O entirely, returning just the stored checksums.

For array values, a normal fetch trims checksums to the selected sub-range. In save_csum() (line 868), a normal fetch calls evt_entry_csum_update(&entry->en_ext, &entry->en_sel_ext, ...) which adjusts the checksum info to cover only en_sel_ext (the selected sub-range). But ddb needs to dump the full stored extent checksums as they exist on disk — using en_ext, not en_sel_ext — so that it can verify or display the raw stored metadata without interpretation. This was explicitly raised in the JIRA ticket discussion: when multiple raw EV ranges overlap (e.g. {0-31} and {8-1047}), we need each raw extent's checksum independently, not a trimmed version.

ddb needs the per-extent epoch paired with each checksum. VOS_OF_FETCH_CSUM uses save_recx() with ent->en_ext and ent->en_epoch (lines 1202-1204), giving ddb the exact stored extent boundaries + write epoch needed to correlate each checksum with its on-disk record. As noted in the JIRA ticket, it is not possible to fetch the checksum of a hidden EV (e.g. ext~16-1039~, ep~0~) with the standard API since it uses EPOCH_MAX — the new flag allows specifying any epoch for targeted checksum retrieval.

No data I/O or SGL allocation is needed. ddb only inspects metadata. Allocating SGLs and transferring data would be wasteful and, as noted in point 1, would fail on corrupted records.

About modifying the existing fetch API instead: the new flag approach was chosen because it doesn't change any existing fetch semantics or code paths, reuses the existing vos_ioh2ci() / vos_ioh2recx_list() interface for returning results, and follows the same pattern as VOS_OF_FETCH_SIZE_ONLY and VOS_OF_FETCH_RECX_LIST.

The follow-up PR (#17365) consumes both vos_ioh2ci() and vos_ioh2recx_list() from a VOS_OF_FETCH_CSUM fetch in dv_dump_csum() to provide the csum_dump functionality described in the ticket.

Does it makes sense ?

If only fetch checksum without original data, how can we verify whether the checksum can match related data or not?

From my investigation on this ticket, the standard fetch API is not suitable for ddb's csum_dump use case for the following reasons:

1. **A normal fetch fails on corrupted records — exactly when ddb needs to inspect checksums.** When the VOS scrubber detects a checksum mismatch, it marks the record address with `BIO_ADDR_IS_CORRUPTED`. On subsequent normal fetches, VOS returns `-DER_CSUM` immediately (`vos_io.c:972-974` for SV, `:1099-1103` for array), and the client retries on an alternative shard. Since ddb operates on a single local VOS pool (no redundancy), a normal fetch would simply fail with `-DER_CSUM`, making it impossible to retrieve the stored checksum metadata. `VOS_OF_FETCH_CSUM` skips the corruption check and data I/O entirely, returning just the stored checksums. 2. **For array values, a normal fetch trims checksums to the selected sub-range.** In `save_csum()` (line 868), a normal fetch calls `evt_entry_csum_update(&entry->en_ext, &entry->en_sel_ext, ...)` which adjusts the checksum info to cover only `en_sel_ext` (the selected sub-range). But ddb needs to dump the **full stored extent checksums** as they exist on disk — using `en_ext`, not `en_sel_ext` — so that it can verify or display the raw stored metadata without interpretation. This was explicitly raised in the [JIRA ticket discussion](https://daosio.atlassian.net/browse/DAOS-17321): when multiple raw EV ranges overlap (e.g. `{0-31}` and `{8-1047}`), we need each raw extent's checksum independently, not a trimmed version. 3. **ddb needs the per-extent epoch paired with each checksum.** `VOS_OF_FETCH_CSUM` uses `save_recx()` with `ent->en_ext` and `ent->en_epoch` (lines 1202-1204), giving ddb the exact stored extent boundaries + write epoch needed to correlate each checksum with its on-disk record. As noted in the JIRA ticket, it is not possible to fetch the checksum of a hidden EV (e.g. `ext~16-1039~, ep~0~`) with the standard API since it uses `EPOCH_MAX` — the new flag allows specifying any epoch for targeted checksum retrieval. 4. **No data I/O or SGL allocation is needed.** ddb only inspects metadata. Allocating SGLs and transferring data would be wasteful and, as noted in point 1, would fail on corrupted records.

About modifying the existing fetch API instead: the new flag approach was chosen because it doesn't change any existing fetch semantics or code paths, reuses the existing vos_ioh2ci() / vos_ioh2recx_list() interface for returning results, and follows the same pattern as VOS_OF_FETCH_SIZE_ONLY and VOS_OF_FETCH_RECX_LIST.

The follow-up PR (#17365) consumes both vos_ioh2ci() and vos_ioh2recx_list() from a VOS_OF_FETCH_CSUM fetch in dv_dump_csum() to provide the csum_dump functionality described in the ticket.

Does it makes sense ?

Yes. Thanks a lot for the explanation.

If only fetch checksum without original data, how can we verify whether the checksum can match related data or not?

This PR is a preliminary step to support the ddb csum_dump command. For that command, we only need to display the checksums — we don't need to verify them against the actual data, so fetching the data is not required.

For the follow-up commands csum_check and csum_resync, the data will definitely be needed to perform the verification, and I will handle that in the follow-up PRs.

Does that make sense?

It is clear now, thanks for the explanation.

…/daos-17321/patch-001

daosbuild3 · 2026-05-21T20:53:46Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/2/execution/node/1116/log

Nasf-Fan · 2026-05-22T09:57:55Z

+	d_sgl_fini(&sgl, false);
+out:
+	D_FREE(update_buf);
+	assert_rc_equal(rc, 0);


How to guarantee this? for example from L#3093, L#3099, and so on.

You're right — if one of the intermediate steps fails (e.g. d_sgl_init), goto out leaves rc != 0 and the assert_rc_equal(rc, 0) at out: will fire, failing the test. The intent was to have the test fail if any step doesn't succeed. However, I can see how mixing error-cleanup and success-assertion in the same label is confusing and could mask the actual failure point.

I am going to restructure the labels to separate the error paths from the final success assertion.

Improve unexpected error failure management of io_csum_fetch_single()

After more investigation, the goto solution turns out to be the best option, as it allows proper cleanup of memory allocations before the test exits.

Using assert_rc_equal directly at intermediate steps (e.g. after d_sgl_init) would trigger longjmp on failure, skipping any subsequent cleanup. Memory leak detectors such as Valgrind would then report spurious leaks, which could be confusing.

To make unexpected failures easier to diagnose, I added a print_message() call before each goto so the failing step is clearly identified in the test output. More details in commit 940b859.

After further analysis, the goto+print_message approach turns out to be inconsistent: the assertions in the check phase (after the setup steps succeed) — assert_null(rel), assert_non_null(cil), assert_int_equal(cil->dcl_csum_infos_nr, ...), etc. — can also trigger longjmp via cmocka and bypass the cleanup labels. So the goto structure only protects cleanup for setup-phase failures, not for check-phase failures. This creates a false sense of security while adding complexity.

Since cleanup is never fully guaranteed in cmocka without a dedicated teardown function, using assert_rc_equal() directly throughout is simpler and equally correct: on failure, cmocka already reports the file and line of the failing assertion, which is as informative as a preceding print_message().

Commit b52bafc simplifies the four affected functions (io_csum_fetch_single, io_csum_fetch_recx, io_csum_write_no_csum, io_csum_fetch_recx_missing_csum) to use assert_rc_equal() throughout with a flat cleanup at the end.

The same simplification was also applied to io_csum_update_recx() in commit 75d6570: the function is converted from int to void, the akey parameter gains a const qualifier, and all goto-based cleanup is replaced with assert_rc_equal() / assert_non_null().

Nasf-Fan · 2026-05-22T09:58:10Z

+	assert_true(daos_csummer_compare_csum_info(csummer, ic[0]->ic_data, ci));
+	ci = dcs_csum_info_get(cil, 1);
+	assert_true(ci_is_valid(ci));
+	assert_int_equal(ci->cs_nr, 3);


How can cs_nr be 3?

The second write uses recx_idx = csum_chunk_size / 2 = 32 and recx_nr = 2 * csum_chunk_size = 128. With csum_chunk_size = 64, the chunk boundaries fall at 0, 64, 128, 192…. Because the write starts at offset 32 (not aligned to a chunk boundary), it touches three chunks and therefore requires three checksums (cs_nr = 3):

byte offset: 0 32 64 128 160 192 │ | | | | | | chunk 0: [-----------------] chunk 1: [-----------------] chunk 2: [-----------------] │ | write 2: [=========================>] (ep=2) recx_idx=32 recx_idx+recx_nr=160

The misalignment of the start index is deliberate to exercise the non-aligned case.

That said, I agree the test case is not immediately obvious. I'll add inline comments and named constants to make the intent clearer.

Use named constant and add comments

Fixed with commit 7cd5cd2

Nasf-Fan · 2026-05-22T09:58:34Z

+	}
+	daos_csummer_destroy(&csummer);
+out:
+	assert_rc_equal(rc, 0);


Can not be guaranteed if out from L#3230 and L#3236.

Same issue as the io_csum_fetch_single case. I'll fix both test functions together by restructuring the cleanup labels.

Improve unexpected error failure management of io_csum_fetch_recx()

See comment #18293 (comment)

Nasf-Fan · 2026-05-22T09:59:08Z

 	ioc->ic_remove = ((vos_flags & VOS_OF_REMOVE) != 0);
 	ioc->ic_ec = ((vos_flags & VOS_OF_EC) != 0);
 	ioc->ic_rebuild    = ((vos_flags & VOS_OF_REBUILD) != 0);
+	ioc->ic_csum_fetch = ((vos_flags & VOS_OF_FETCH_CSUM) != 0);


If only fetch checksum without original data, how can we verify whether the checksum can match related data or not?

Nasf-Fan · 2026-05-22T10:00:51Z

If firstly N entries have no checksum, but subsequent entries have, then current logic cannot detect. That is some kind of short-comings, needs to be improved.

On the other hand, for ddb, if we lost checksum for some entries, we need to make the caller to know that via the returned <csum, epoch> lists. Otherwise, the caller maybe not aware of related corruption.

You're right on both points.

For the csum_enabled blind spot: I'll add a csum_missing flag alongside csum_enabled so that both transition directions ([no-csum → has-csum] and [has-csum → no-csum]) are detected.

For the ddb case: I'll remove the csum_enabled gate on save_recx so it is always called when ic_csum_fetch is set, and save a null/invalid placeholder in the csum list for entries with no checksum. This way the caller can see all entries and identify which ones are missing a checksum.

I'll also add a unit test that writes one extent without a checksum (NULL iod_csums) and one with, then verifies both appear in the returned <csum, epoch> list with the no-csum entry properly flagged.

~~- [ ] Add csum_missing flag in akey_fetch_recx() to detect both inconsistency directions~~
~~- [ ] Save a null placeholder in the csum list for entries with no checksum~~

Add io_csum_fetch_recx_missing_csum unit test

After further investigation, the csum_missing flag approach turns out to be unworkable due to an EVT limitation: evt_entry_csum_fill() propagates tr_csum_len — a root-level field — into the csum metadata of every non-hole entry at fetch time. As a result, ci_is_valid() always returns true for non-hole entries whenever tr_csum_len > 0, even for extents that were written without a checksum. Detecting the mixed case (some extents with, some without a checksum) at fetch time would require per-evt_desc metadata, which would change the VOS persistent format — clearly out of scope for this PR.

Since the mixed case is indistinguishable, null placeholders are also unnecessary: the csum list will either be fully populated (all entries have a checksum) or empty (none do). Commit 424d306 removes the csum_enabled gate so that save_recx() is always called when ic_csum_fetch is set, and leaves the csum list empty when no checksums are present rather than filling it with null placeholders. The io_csum_fetch_recx_missing_csum unit test validates this behaviour.

…/daos-17321/patch-001

Add print_message() before each goto so the failing operation is named in the test output. Keep goto-based cleanup labels to avoid memory leaks on longjmp. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

Unroll the two-iteration loop into explicit writes with named constants (recx_size, recx2_idx) to eliminate magic numbers. Add a write-layout comment before the function declaration describing the chunk-boundary arithmetic and the expected cs_nr values for each write. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

save_recx() was gated on a local csum_enabled flag, so extents without a stored checksum were absent from the VOS_OF_FETCH_CSUM output used by ddb. Remove the gate so save_recx() is always called when ic_csum_fetch is set. When no entry in the akey stored a checksum (EVT root tr_csum_len==0), ci_is_valid() reliably returns false for all entries. In that case the csum list is left empty (dcl_csum_infos_nr==0), which is a cleaner signal than filling it with null placeholders that the caller would have to iterate through to reach the same conclusion. Mixed-checksum detection (some entries with, some without) is not added: ci_is_valid() is unreliable for that purpose because evt_entry_csum_fill() propagates the root-level tr_csum_len to all non-hole entries, making ci_is_valid() return true even for entries stored without a checksum. Add VOS401.2 (io_csum_fetch_recx_missing_csum) verifying that no-csum extents appear in the recx list and the csum list is empty. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

…/daos-17321/patch-001

Replace the goto-based error handling in the csum fetch test functions with direct assert_rc_equal() calls throughout. The goto+cleanup-label structure gives only a partial guarantee: the assertions in the check phase (after setup) already bypass the cleanup labels via longjmp on failure. Since cleanup is never fully guaranteed in cmocka without a teardown function, the goto structure in the setup phase provides a false sense of security while adding complexity. Using assert_rc_equal() directly is simpler, consistent throughout each function, and equally correct: cmocka reports the file and line of the failing assertion, which is as informative as a preceding print_message(). Functions simplified: - io_csum_fetch_single - io_csum_fetch_recx - io_csum_write_no_csum - io_csum_fetch_recx_missing_csum Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

…e_recx Apply the same assert_rc_equal() simplification to io_csum_update_recx() that was applied to the other csum test helpers in commit b52bafc. The function is converted from returning int to void, the akey parameter gains a const qualifier, and the goto-based cleanup is replaced with assert_rc_equal() and assert_non_null() throughout. Callers in io_csum_fetch_recx no longer capture the return value. Note: unlike the test functions with a check phase, io_csum_update_recx is a pure write helper where the original goto approach did protect cleanup. The simplification is still appropriate for consistency and the practical risk is negligible (small test allocations, operations that virtually never fail in a test environment). Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

daosbuild3 · 2026-05-27T19:47:05Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/5/execution/node/1048/log

…/daos-17321/patch-001

daosbuild3 · 2026-05-28T15:18:27Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/6/execution/node/1038/log

…/daos-17321/patch-001

daosbuild3 · 2026-05-29T10:16:39Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18293/7/display/redirect

daosbuild3 · 2026-05-29T18:35:37Z

Test stage Test RPMs on EL 9.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/8/execution/node/431/log

daosbuild3 · 2026-05-29T20:23:14Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/8/execution/node/491/log

…/daos-17321/patch-001

daosbuild3 · 2026-06-01T07:54:05Z

Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18293/9/testReport/

daosbuild3 · 2026-06-01T12:45:25Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/9/execution/node/1101/log

daosbuild3 · 2026-06-01T14:13:00Z

Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18293/10/testReport/

daosbuild3 · 2026-06-01T19:13:55Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/10/execution/node/1101/log

…/daos-17321/patch-001

daosbuild3 · 2026-06-02T08:16:37Z

Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18293/11/testReport/

daosbuild3 · 2026-06-02T12:42:08Z

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18293/11/execution/node/1086/log

…/daos-17321/patch-001

daosbuild3 · 2026-06-03T19:06:41Z

Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18293/14/testReport/

daosbuild3 · 2026-06-04T04:13:12Z

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18293/14/testReport/

knard38 · 2026-06-04T11:32:27Z

All CI failures in build #14 are pre-existing infrastructure or unrelated issues — none are introduced by this PR (which only touches vos_io.c and vts_io.c):

Functional test_dfuse_daos_build_wb (FAIL): dfuse rejects libtool hard links with EOPNOTSUPP when building inside a writeback-mode dfuse mount. Tracked in DAOS-19076. Unrelated to VOS checksum changes.
NLT (UNSTABLE): ~110 Valgrind errors all come from dfuse binary runs — no VOS tests are run under Valgrind in the NLT stage. Same pre-existing suppression gaps observed in earlier CI builds. The 8 entries in nlt-errors.json are expected artifacts from the test_evict scenario (DER_NO_HDL/DER_NO_PERM after handle eviction) and normal server shutdown (in-flight RPCs at dmg system stop).

knard38 · 2026-06-04T11:42:45Z

@daos-stack/daos-gatekeeper according to my previous comment, could you lend this PR with the following commit:

title: DAOS-17321 ddb: Add checksum dump function to ddb vos API
body:

Add the VOS_OF_FETCH_CSUM flag to vos_fetch_begin() so callers can
retrieve per-extent checksum metadata without fetching data. This is
the VOS-level building block for the ddb csum_dump command.

When the flag is set, all extents are reported regardless of whether
they carry a checksum. When none do, the csum list is left empty as a
signal to the caller. Mixed-checksum detection is deferred to a
follow-up.

New unit tests cover single-value and array checksum fetch, including
the no-checksum case. Also includes minor fixes (resource leak, missing
newlines, clang-format alignment) and test readability improvements.

kanard38 added 2 commits May 19, 2026 17:43

knard38 closed this May 19, 2026

knard38 reopened this May 19, 2026

knard38 changed the title ~~Ckochhof/dev/master/daos 17321/patch 001~~ DAOS-17321 ddb: Add checksum dump function to ddb vos API May 19, 2026

knard38 self-assigned this May 19, 2026

knard38 marked this pull request as ready for review May 19, 2026 19:57

knard38 requested review from a team as code owners May 19, 2026 19:57

knard38 requested review from Nasf-Fan and janekmi May 19, 2026 19:58

knard38 requested a review from NiuYawei May 19, 2026 20:00

NiuYawei reviewed May 21, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

b59c2a8

…/daos-17321/patch-001

knard38 requested a review from NiuYawei May 21, 2026 07:42

Nasf-Fan requested changes May 22, 2026

View reviewed changes

kanard38 added 5 commits May 26, 2026 08:45

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

5624f0e

…/daos-17321/patch-001

DAOS-17321 vos: Add diagnostic messages to csum fetch tests

940b859

Add print_message() before each goto so the failing operation is named in the test output. Keep goto-based cleanup labels to avoid memory leaks on longjmp. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

296a3ef

…/daos-17321/patch-001

knard38 requested a review from Nasf-Fan May 27, 2026 07:08

kanard38 added 2 commits May 27, 2026 08:12

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

c7323bb

…/daos-17321/patch-001

Nasf-Fan approved these changes May 28, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

ae7e315

…/daos-17321/patch-001

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

455e701

…/daos-17321/patch-001

NiuYawei approved these changes Jun 1, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

29c876c

…/daos-17321/patch-001

Merge remote-tracking branch 'origin/master' into ckochhof/dev/master…

4940a88

…/daos-17321/patch-001

knard38 requested a review from a team June 4, 2026 11:32

daltonbohning merged commit 2b9ff82 into master Jun 4, 2026
43 of 46 checks passed

daltonbohning deleted the ckochhof/dev/master/daos-17321/patch-001 branch June 4, 2026 14:48

Conversation

knard38 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Behavior of VOS_OF_FETCH_CSUM

Commits

New tests

Related PRs (in order)

Steps for the author:

After all prior steps are complete:

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daosbuild3 commented May 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daosbuild3 commented May 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knard38 May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knard38 May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knard38 May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knard38 May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daosbuild3 commented May 27, 2026

Uh oh!

daosbuild3 commented May 28, 2026

Uh oh!

daosbuild3 commented May 29, 2026

Uh oh!

daosbuild3 commented May 29, 2026

Uh oh!

knard38 commented May 19, 2026 •

edited

Loading

Behavior of `VOS_OF_FETCH_CSUM`

github-actions Bot commented May 19, 2026 •

edited

Loading

knard38 May 26, 2026 •

edited

Loading

knard38 May 26, 2026 •

edited

Loading

knard38 May 26, 2026 •

edited

Loading

knard38 May 26, 2026 •

edited

Loading