DAOS-19019 spdk: daos_server_helper SPDK bindings leak memory#18427
DAOS-19019 spdk: daos_server_helper SPDK bindings leak memory#18427knard38 wants to merge 3 commits into
Conversation
opts_add_pci_addr() allocates opts.pci_allowed via realloc() but daos_spdk_init() never frees it after calling spdk_env_init(). Add free(opts.pci_allowed) at the out: label so both the success and error paths release the buffer. Add test_daos_spdk_init_pci_freed to nvme_control_ut.c to verify under ASan that opts.pci_allowed is freed on return. Also fix the mock_spdk_nvme_ctrlr_get_pci_device() mock to use a static local pool instead of calloc(), which was itself leaking under ASan. Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
_collect() allocates ctrlr->pci_addr via calloc() for each controller, but free_ctrlr_fields() freed all other heap fields (model, serial, fw_rev, vendor_id, pci_type) while omitting pci_addr. Add the missing free(). Add test_collect_pci_addr_freed to nvme_control_ut.c to verify under ASan that pci_addr is freed by teardown via clean_ret() -> free_ctrlr_fields(). Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
|
Ticket title is 'daos_server_helper SPDK bindings leak memory during NVMe discovery and initialisation' |
|
Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18427/1/testReport/ |
…/daos-19019/patch-001
|
Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18427/2/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18427/2/testReport/ |
|
All CI failures in build #2 are pre-existing infrastructure or unrelated issues — none are introduced by this PR (which only touches
|
|
@daos-stack/daos-gatekeeper according to my previous comment, could you land this PR with the following commit:
|
Description
Fix two memory leaks in the SPDK C bindings used by
daos_server_helper, detected by running the NVMe control unit tests under AddressSanitizer (ASAN_OPTIONS=detect_leaks=1).Leak 1 —
opts.pci_allowed(8 bytes,nvme_control.c)daos_spdk_init()builds a PCI allowlist by callingopts_add_pci_addr(), which allocatesopts.pci_allowedviarealloc(). This buffer was never freed after the call tospdk_env_init(). The fix addsfree(opts.pci_allowed)at theout:label, which is reached by both the success and error paths.Leak 2 —
ctrlr->pci_addr(257 bytes per controller,nvme_control_common.c)_collect()allocates a PCI address string per controller viacalloc(1, SPDK_NVMF_TRADDR_MAX_LEN + 1)and stores it inctrlr->pci_addr.free_ctrlr_fields()freed all other heap fields (model,serial,fw_rev,vendor_id,pci_type) but omittedpci_addr. The fix adds the missingfree().Tests
Two new unit tests are added to
nvme_control_ut.c, both intended to be run with ASan:test_daos_spdk_init_pci_freed: callsdaos_spdk_init()with one PCI address and lets ASan verify thatopts.pci_allowedis freed on return.spdk_env_init()is expected to fail in a unit-test context (no hugepages); this is acceptable since the leak check is ASan-driven.test_collect_pci_addr_freed: drives_collect()via mock function pointers (no real SPDK or hardware required), then lets teardown callclean_ret()→free_ctrlr_fields()so ASan can verify thatpci_addris freed for each controller.The
mock_spdk_nvme_ctrlr_get_pci_device()mock is also fixed: it previously leaked a heap-allocatedspdk_pci_deviceper call, which was invisible without ASan. It now returns entries from a static local pool (sized byMAX_MOCK_PCI_DEVS), with a module-level counter reset in teardown.The fix was locally validated with using the following command:
Steps for the author:
After all prior steps are complete: