Skip to content

DAOS-17321 ddb: Add checksum dump function to ddb vos API#18293

Merged
daltonbohning merged 15 commits into
masterfrom
ckochhof/dev/master/daos-17321/patch-001
Jun 4, 2026
Merged

DAOS-17321 ddb: Add checksum dump function to ddb vos API#18293
daltonbohning merged 15 commits into
masterfrom
ckochhof/dev/master/daos-17321/patch-001

Conversation

@knard38
Copy link
Copy Markdown
Contributor

@knard38 knard38 commented May 19, 2026

Description

This PR is the first in a series of four addressing DAOS-17321, which adds checksum management commands (csum_dump, csum_check, csum_resync) to the DAOS Debugger (ddb).

It adds a new VOS fetch flag VOS_OF_FETCH_CSUM to vos_fetch_begin(). When set, the flag allows a caller to retrieve per-extent checksum metadata stored in VOS without reading or transferring the actual data payload. This
is the prerequisite for the subsequent ddb patches that implement csum_dump.

Behavior of VOS_OF_FETCH_CSUM

  • Skips SGL initialization and all data I/O (analogous to VOS_OF_FETCH_SIZE_ONLY).
  • Single value (SV): saves the checksum info via save_csum() then returns early; iod_size is intentionally not updated on a csum-only fetch.
  • Array value: saves the full stored extent (en_ext) paired with its epoch via save_recx(), then saves the corresponding checksum via save_csum(). The full stored extent (not the selected sub-range) is used to preserve the checksum chunk-boundary alignment.
  • Mutually exclusive with VOS_OF_FETCH_RECX_LIST: both flags write ic_recx_lists with incompatible semantics; combining them returns -DER_INVAL.

Commits

  1. Minor test and formatting fixes — clang-format alignment of the VOS_OF_* enum, missing \n in two D_PRINT() calls, d_sgl_fini() resource-leak fix after a failed update, and copyright bumps.
  2. VOS_OF_FETCH_CSUM feature — new flag, VOS I/O skip paths, flag validation guard, and unit tests.

New tests

  • VOS400.1 io_csum_fetch_single: writes a single value with a CRC64 checksum, fetches it with VOS_OF_FETCH_CSUM, and verifies the retrieved dcs_csum_info matches the stored one.
  • VOS401.1 io_csum_fetch_recx: writes two overlapping array extents at different epochs with CRC16 checksums, then verifies that both extents and their checksums (with correct chunk counts) are returned.

Related PRs (in order)

  1. This PR — VOS layer: VOS_OF_FETCH_CSUM flag
  2. #17365 — ddb VOS API: ddb_vos_csum_dump()
  3. #17423 — ddb C API: csum_dump command
  4. #17474 — ddb Go layer: csum_dump CLI binding

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

kanard38 added 2 commits May 19, 2026 17:43
Fix resource leak in io_sgl_fetch(): move d_sgl_fini() before the error
check so the SGL is freed regardless of vos_obj_update() return code.

Fix missing newline in two D_PRINT() calls in vos_tests.c.

Apply clang-format alignment to VOS_OF_* enum in vos_types.h to fix
pre-existing clang-format non-conformance.

Reformat io_test_init_csummer() in vts_io.c to match clang-format style.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
Add VOS_OF_FETCH_CSUM (1 << 21) flag to vos_fetch_begin() allowing
callers to retrieve per-extent checksum metadata without fetching the
actual data. This is intended for the ddb (DAOS Debugger) tool which
needs to inspect and manage checksums stored in VOS.

When VOS_OF_FETCH_CSUM is set:
- SGL/bio-buffer allocation is skipped (like VOS_OF_FETCH_SIZE_ONLY).
- For single-value records: save_csum() is called, then the data fetch
  is skipped. iod_size is not updated.
- For array records: save_csum() is called as normal, and the full
  stored extent bounds (en_ext) plus epoch are saved via save_recx()
  so that callers get the raw per-version extent list paired with
  their checksums via vos_ioh2recx_list() and vos_ioh2ci().

VOS_OF_FETCH_CSUM and VOS_OF_FETCH_RECX_LIST are mutually exclusive
since both write ic_recx_lists with incompatible semantics. An explicit
-DER_INVAL check is added in vos_ioc_create() (in addition to the
existing D_ASSERT in the hot path).

Add unit tests for the new flag:
- VOS400.1: Fetch checksum of a single-value record (CRC64).
- VOS401.1: Fetch checksums of overlapping array extents (CRC16).

Features: recovery
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@knard38 knard38 closed this May 19, 2026
@knard38 knard38 reopened this May 19, 2026
@knard38 knard38 changed the title Ckochhof/dev/master/daos 17321/patch 001 DAOS-17321 ddb: Add checksum dump function to ddb vos API May 19, 2026
@knard38 knard38 self-assigned this May 19, 2026
@knard38 knard38 marked this pull request as ready for review May 19, 2026 19:57
@knard38 knard38 requested review from a team as code owners May 19, 2026 19:57
@knard38 knard38 requested review from Nasf-Fan and janekmi May 19, 2026 19:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Ticket title is 'Checksum management with ddb'
Status is 'Blocked'
https://daosio.atlassian.net/browse/DAOS-17321

@knard38 knard38 requested a review from NiuYawei May 19, 2026 20:00
@daosbuild3
Copy link
Copy Markdown
Collaborator

Comment thread src/vos/vos_io.c
ioc->ic_remove = ((vos_flags & VOS_OF_REMOVE) != 0);
ioc->ic_ec = ((vos_flags & VOS_OF_EC) != 0);
ioc->ic_rebuild = ((vos_flags & VOS_OF_REBUILD) != 0);
ioc->ic_csum_fetch = ((vos_flags & VOS_OF_FETCH_CSUM) != 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific use case for fetching only the csum without data? Since current fetch API returns both, can we just use the existing API and extract the csum from the response?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my investigation on this ticket, the standard fetch API is not suitable for ddb's csum_dump use case for the following reasons:

  1. A normal fetch fails on corrupted records — exactly when ddb needs to inspect checksums. When the VOS scrubber detects a checksum mismatch, it marks the record address with BIO_ADDR_IS_CORRUPTED. On subsequent normal fetches, VOS returns -DER_CSUM immediately (vos_io.c:972-974 for SV, :1099-1103 for array), and the client retries on an alternative shard. Since ddb operates on a single local VOS pool (no redundancy), a normal fetch would simply fail with -DER_CSUM, making it impossible to retrieve the stored checksum metadata. VOS_OF_FETCH_CSUM skips the corruption check and data I/O entirely, returning just the stored checksums.

  2. For array values, a normal fetch trims checksums to the selected sub-range. In save_csum() (line 868), a normal fetch calls evt_entry_csum_update(&entry->en_ext, &entry->en_sel_ext, ...) which adjusts the checksum info to cover only en_sel_ext (the selected sub-range). But ddb needs to dump the full stored extent checksums as they exist on disk — using en_ext, not en_sel_ext — so that it can verify or display the raw stored metadata without interpretation. This was explicitly raised in the JIRA ticket discussion: when multiple raw EV ranges overlap (e.g. {0-31} and {8-1047}), we need each raw extent's checksum independently, not a trimmed version.

  3. ddb needs the per-extent epoch paired with each checksum. VOS_OF_FETCH_CSUM uses save_recx() with ent->en_ext and ent->en_epoch (lines 1202-1204), giving ddb the exact stored extent boundaries + write epoch needed to correlate each checksum with its on-disk record. As noted in the JIRA ticket, it is not possible to fetch the checksum of a hidden EV (e.g. ext~16-1039~, ep~0~) with the standard API since it uses EPOCH_MAX — the new flag allows specifying any epoch for targeted checksum retrieval.

  4. No data I/O or SGL allocation is needed. ddb only inspects metadata. Allocating SGLs and transferring data would be wasteful and, as noted in point 1, would fail on corrupted records.

About modifying the existing fetch API instead: the new flag approach was chosen because it doesn't change any existing fetch semantics or code paths, reuses the existing vos_ioh2ci() / vos_ioh2recx_list() interface for returning results, and follows the same pattern as VOS_OF_FETCH_SIZE_ONLY and VOS_OF_FETCH_RECX_LIST.

The follow-up PR (#17365) consumes both vos_ioh2ci() and vos_ioh2recx_list() from a VOS_OF_FETCH_CSUM fetch in dv_dump_csum() to provide the csum_dump functionality described in the ticket.

Does it makes sense ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only fetch checksum without original data, how can we verify whether the checksum can match related data or not?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my investigation on this ticket, the standard fetch API is not suitable for ddb's csum_dump use case for the following reasons:

1. **A normal fetch fails on corrupted records — exactly when ddb needs to inspect checksums.** When the VOS scrubber detects a checksum mismatch, it marks the record address with `BIO_ADDR_IS_CORRUPTED`. On subsequent normal fetches, VOS returns `-DER_CSUM` immediately (`vos_io.c:972-974` for SV, `:1099-1103` for array), and the client retries on an alternative shard. Since ddb operates on a single local VOS pool (no redundancy), a normal fetch would simply fail with `-DER_CSUM`, making it impossible to retrieve the stored checksum metadata. `VOS_OF_FETCH_CSUM` skips the corruption check and data I/O entirely, returning just the stored checksums.

2. **For array values, a normal fetch trims checksums to the selected sub-range.** In `save_csum()` (line 868), a normal fetch calls `evt_entry_csum_update(&entry->en_ext, &entry->en_sel_ext, ...)` which adjusts the checksum info to cover only `en_sel_ext` (the selected sub-range). But ddb needs to dump the **full stored extent checksums** as they exist on disk — using `en_ext`, not `en_sel_ext` — so that it can verify or display the raw stored metadata without interpretation. This was explicitly raised in the [JIRA ticket discussion](https://daosio.atlassian.net/browse/DAOS-17321): when multiple raw EV ranges overlap (e.g. `{0-31}` and `{8-1047}`), we need each raw extent's checksum independently, not a trimmed version.

3. **ddb needs the per-extent epoch paired with each checksum.** `VOS_OF_FETCH_CSUM` uses `save_recx()` with `ent->en_ext` and `ent->en_epoch` (lines 1202-1204), giving ddb the exact stored extent boundaries + write epoch needed to correlate each checksum with its on-disk record. As noted in the JIRA ticket, it is not possible to fetch the checksum of a hidden EV (e.g. `ext~16-1039~, ep~0~`) with the standard API since it uses `EPOCH_MAX` — the new flag allows specifying any epoch for targeted checksum retrieval.

4. **No data I/O or SGL allocation is needed.** ddb only inspects metadata. Allocating SGLs and transferring data would be wasteful and, as noted in point 1, would fail on corrupted records.

About modifying the existing fetch API instead: the new flag approach was chosen because it doesn't change any existing fetch semantics or code paths, reuses the existing vos_ioh2ci() / vos_ioh2recx_list() interface for returning results, and follows the same pattern as VOS_OF_FETCH_SIZE_ONLY and VOS_OF_FETCH_RECX_LIST.

The follow-up PR (#17365) consumes both vos_ioh2ci() and vos_ioh2recx_list() from a VOS_OF_FETCH_CSUM fetch in dv_dump_csum() to provide the csum_dump functionality described in the ticket.

Does it makes sense ?

Yes. Thanks a lot for the explanation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only fetch checksum without original data, how can we verify whether the checksum can match related data or not?

This PR is a preliminary step to support the ddb csum_dump command. For that command, we only need to display the checksums — we don't need to verify them against the actual data, so fetching the data is not required.

For the follow-up commands csum_check and csum_resync, the data will definitely be needed to perform the verification, and I will handle that in the follow-up PRs.

Does that make sense?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is clear now, thanks for the explanation.

@knard38 knard38 requested a review from NiuYawei May 21, 2026 07:42
@daosbuild3
Copy link
Copy Markdown
Collaborator

Comment thread src/vos/tests/vts_io.c Outdated
d_sgl_fini(&sgl, false);
out:
D_FREE(update_buf);
assert_rc_equal(rc, 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to guarantee this? for example from L#3093, L#3099, and so on.

Copy link
Copy Markdown
Contributor Author

@knard38 knard38 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — if one of the intermediate steps fails (e.g. d_sgl_init), goto out leaves rc != 0 and the assert_rc_equal(rc, 0) at out: will fire, failing the test. The intent was to have the test fail if any step doesn't succeed. However, I can see how mixing error-cleanup and success-assertion in the same label is confusing and could mask the actual failure point.

I am going to restructure the labels to separate the error paths from the final success assertion.

  • Improve unexpected error failure management of io_csum_fetch_single()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more investigation, the goto solution turns out to be the best option, as it allows proper cleanup of memory allocations before the test exits.

Using assert_rc_equal directly at intermediate steps (e.g. after d_sgl_init) would trigger longjmp on failure, skipping any subsequent cleanup. Memory leak detectors such as Valgrind would then report spurious leaks, which could be confusing.

To make unexpected failures easier to diagnose, I added a print_message() call before each goto so the failing step is clearly identified in the test output. More details in commit 940b859.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further analysis, the goto+print_message approach turns out to be inconsistent: the assertions in the check phase (after the setup steps succeed) — assert_null(rel), assert_non_null(cil), assert_int_equal(cil->dcl_csum_infos_nr, ...), etc. — can also trigger longjmp via cmocka and bypass the cleanup labels. So the goto structure only protects cleanup for setup-phase failures, not for check-phase failures. This creates a false sense of security while adding complexity.

Since cleanup is never fully guaranteed in cmocka without a dedicated teardown function, using assert_rc_equal() directly throughout is simpler and equally correct: on failure, cmocka already reports the file and line of the failing assertion, which is as informative as a preceding print_message().

Commit b52bafc simplifies the four affected functions (io_csum_fetch_single, io_csum_fetch_recx, io_csum_write_no_csum, io_csum_fetch_recx_missing_csum) to use assert_rc_equal() throughout with a flat cleanup at the end.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same simplification was also applied to io_csum_update_recx() in commit 75d6570: the function is converted from int to void, the akey parameter gains a const qualifier, and all goto-based cleanup is replaced with assert_rc_equal() / assert_non_null().

Comment thread src/vos/tests/vts_io.c
assert_true(daos_csummer_compare_csum_info(csummer, ic[0]->ic_data, ci));
ci = dcs_csum_info_get(cil, 1);
assert_true(ci_is_valid(ci));
assert_int_equal(ci->cs_nr, 3);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can cs_nr be 3?

Copy link
Copy Markdown
Contributor Author

@knard38 knard38 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second write uses recx_idx = csum_chunk_size / 2 = 32 and recx_nr = 2 * csum_chunk_size = 128. With csum_chunk_size = 64, the chunk boundaries fall at 0, 64, 128, 192…. Because the write starts at offset 32 (not aligned to a chunk boundary), it touches three chunks and therefore requires three checksums (cs_nr = 3):

byte offset:  0        32       64       128      160      192
│             |        |        |        |        |        |
chunk 0:      [-----------------]
chunk 1:               [-----------------]
chunk 2:                        [-----------------]
│                                                 |
write 2:               [=========================>]
(ep=2)        recx_idx=32              recx_idx+recx_nr=160

The misalignment of the start index is deliberate to exercise the non-aligned case.

That said, I agree the test case is not immediately obvious. I'll add inline comments and named constants to make the intent clearer.

  • Use named constant and add comments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with commit 7cd5cd2

Comment thread src/vos/tests/vts_io.c
}
daos_csummer_destroy(&csummer);
out:
assert_rc_equal(rc, 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can not be guaranteed if out from L#3230 and L#3236.

Copy link
Copy Markdown
Contributor Author

@knard38 knard38 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the io_csum_fetch_single case. I'll fix both test functions together by restructuring the cleanup labels.

  • Improve unexpected error failure management of io_csum_fetch_recx()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment #18293 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment #18293 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment #18293 (comment)

Comment thread src/vos/vos_io.c
ioc->ic_remove = ((vos_flags & VOS_OF_REMOVE) != 0);
ioc->ic_ec = ((vos_flags & VOS_OF_EC) != 0);
ioc->ic_rebuild = ((vos_flags & VOS_OF_REBUILD) != 0);
ioc->ic_csum_fetch = ((vos_flags & VOS_OF_FETCH_CSUM) != 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only fetch checksum without original data, how can we verify whether the checksum can match related data or not?

Comment thread src/vos/vos_io.c Outdated
@@ -1179,9 +1191,24 @@ akey_fetch_recx(daos_handle_t toh, const daos_epoch_range_t *epr,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If firstly N entries have no checksum, but subsequent entries have, then current logic cannot detect. That is some kind of short-comings, needs to be improved.

On the other hand, for ddb, if we lost checksum for some entries, we need to make the caller to know that via the returned <csum, epoch> lists. Otherwise, the caller maybe not aware of related corruption.

Copy link
Copy Markdown
Contributor Author

@knard38 knard38 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right on both points.

For the csum_enabled blind spot: I'll add a csum_missing flag alongside csum_enabled so that both transition directions ([no-csum → has-csum] and [has-csum → no-csum]) are detected.

For the ddb case: I'll remove the csum_enabled gate on save_recx so it is always called when ic_csum_fetch is set, and save a null/invalid placeholder in the csum list for entries with no checksum. This way the caller can see all entries and identify which ones are missing a checksum.

I'll also add a unit test that writes one extent without a checksum (NULL iod_csums) and one with, then verifies both appear in the returned <csum, epoch> list with the no-csum entry properly flagged.

- [ ] Add csum_missing flag in akey_fetch_recx() to detect both inconsistency directions
- [ ] Save a null placeholder in the csum list for entries with no checksum

  • Add io_csum_fetch_recx_missing_csum unit test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further investigation, the csum_missing flag approach turns out to be unworkable due to an EVT limitation: evt_entry_csum_fill() propagates tr_csum_len — a root-level field — into the csum metadata of every non-hole entry at fetch time. As a result, ci_is_valid() always returns true for non-hole entries whenever tr_csum_len > 0, even for extents that were written without a checksum. Detecting the mixed case (some extents with, some without a checksum) at fetch time would require per-evt_desc metadata, which would change the VOS persistent format — clearly out of scope for this PR.

Since the mixed case is indistinguishable, null placeholders are also unnecessary: the csum list will either be fully populated (all entries have a checksum) or empty (none do). Commit 424d306 removes the csum_enabled gate so that save_recx() is always called when ic_csum_fetch is set, and leaves the csum list empty when no checksums are present rather than filling it with null placeholders. The io_csum_fetch_recx_missing_csum unit test validates this behaviour.

kanard38 added 5 commits May 26, 2026 08:45
Add print_message() before each goto so the failing operation is named
in the test output. Keep goto-based cleanup labels to avoid memory
leaks on longjmp.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
Unroll the two-iteration loop into explicit writes with named constants
(recx_size, recx2_idx) to eliminate magic numbers. Add a write-layout
comment before the function declaration describing the chunk-boundary
arithmetic and the expected cs_nr values for each write.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
save_recx() was gated on a local csum_enabled flag, so extents without
a stored checksum were absent from the VOS_OF_FETCH_CSUM output used by
ddb. Remove the gate so save_recx() is always called when ic_csum_fetch
is set.

When no entry in the akey stored a checksum (EVT root tr_csum_len==0),
ci_is_valid() reliably returns false for all entries. In that case the
csum list is left empty (dcl_csum_infos_nr==0), which is a cleaner
signal than filling it with null placeholders that the caller would have
to iterate through to reach the same conclusion.

Mixed-checksum detection (some entries with, some without) is not added:
ci_is_valid() is unreliable for that purpose because evt_entry_csum_fill()
propagates the root-level tr_csum_len to all non-hole entries, making
ci_is_valid() return true even for entries stored without a checksum.

Add VOS401.2 (io_csum_fetch_recx_missing_csum) verifying that no-csum
extents appear in the recx list and the csum list is empty.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@knard38 knard38 requested a review from Nasf-Fan May 27, 2026 07:08
kanard38 added 2 commits May 27, 2026 08:12
Replace the goto-based error handling in the csum fetch test functions
with direct assert_rc_equal() calls throughout.

The goto+cleanup-label structure gives only a partial guarantee: the
assertions in the check phase (after setup) already bypass the cleanup
labels via longjmp on failure. Since cleanup is never fully guaranteed
in cmocka without a teardown function, the goto structure in the setup
phase provides a false sense of security while adding complexity.

Using assert_rc_equal() directly is simpler, consistent throughout each
function, and equally correct: cmocka reports the file and line of the
failing assertion, which is as informative as a preceding print_message().

Functions simplified:
- io_csum_fetch_single
- io_csum_fetch_recx
- io_csum_write_no_csum
- io_csum_fetch_recx_missing_csum

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
…e_recx

Apply the same assert_rc_equal() simplification to io_csum_update_recx()
that was applied to the other csum test helpers in commit b52bafc.

The function is converted from returning int to void, the akey parameter
gains a const qualifier, and the goto-based cleanup is replaced with
assert_rc_equal() and assert_non_null() throughout. Callers in
io_csum_fetch_recx no longer capture the return value.

Note: unlike the test functions with a check phase, io_csum_update_recx
is a pure write helper where the original goto approach did protect
cleanup. The simplification is still appropriate for consistency and the
practical risk is negligible (small test allocations, operations that
virtually never fail in a test environment).

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18293/7/display/redirect

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18293/14/testReport/

@knard38
Copy link
Copy Markdown
Contributor Author

knard38 commented Jun 4, 2026

All CI failures in build #14 are pre-existing infrastructure or unrelated issues — none are introduced by this PR (which only touches vos_io.c and vts_io.c):

  • Functional test_dfuse_daos_build_wb (FAIL): dfuse rejects libtool hard links with EOPNOTSUPP when building inside a writeback-mode dfuse mount. Tracked in DAOS-19076. Unrelated to VOS checksum changes.
  • NLT (UNSTABLE): ~110 Valgrind errors all come from dfuse binary runs — no VOS tests are run under Valgrind in the NLT stage. Same pre-existing suppression gaps observed in earlier CI builds. The 8 entries in nlt-errors.json are expected artifacts from the test_evict scenario (DER_NO_HDL/DER_NO_PERM after handle eviction) and normal server shutdown (in-flight RPCs at dmg system stop).

@knard38 knard38 requested a review from a team June 4, 2026 11:32
@knard38
Copy link
Copy Markdown
Contributor Author

knard38 commented Jun 4, 2026

@daos-stack/daos-gatekeeper according to my previous comment, could you lend this PR with the following commit:

  • title: DAOS-17321 ddb: Add checksum dump function to ddb vos API
  • body:
Add the VOS_OF_FETCH_CSUM flag to vos_fetch_begin() so callers can
retrieve per-extent checksum metadata without fetching data. This is
the VOS-level building block for the ddb csum_dump command.

When the flag is set, all extents are reported regardless of whether
they carry a checksum. When none do, the csum list is left empty as a
signal to the caller. Mixed-checksum detection is deferred to a
follow-up.

New unit tests cover single-value and array checksum fetch, including
the no-checksum case. Also includes minor fixes (resource leak, missing
newlines, clang-format alignment) and test readability improvements.

@daltonbohning daltonbohning merged commit 2b9ff82 into master Jun 4, 2026
43 of 46 checks passed
@daltonbohning daltonbohning deleted the ckochhof/dev/master/daos-17321/patch-001 branch June 4, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants