Skip to content

DAOS-18860 control: calc engine mem_size only on tgt count#18407

Merged
daltonbohning merged 3 commits into
masterfrom
tanabarr/control-engine-memsize-mdonssd-fix
Jun 4, 2026
Merged

DAOS-18860 control: calc engine mem_size only on tgt count#18407
daltonbohning merged 3 commits into
masterfrom
tanabarr/control-engine-memsize-mdonssd-fix

Conversation

@tanabarr
Copy link
Copy Markdown
Contributor

@tanabarr tanabarr commented Jun 2, 2026

Pass memory size calculation to engine based on a 1gib/tgt quota
despite control-plane hugepage allocations taking MD-on-SSD
System-XStream into account when calculating.

Reorganize test cases in server_utils_test.go by splitting error
scenarios into a dedicated test function.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested review from a team as code owners June 2, 2026 16:03
@tanabarr tanabarr self-assigned this Jun 2, 2026
@daosbuild3
Copy link
Copy Markdown
Collaborator

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Ticket title is 'rebuild/no_cap.py:RbldNoCapacity.test_rebuild_no_capacity - Failed to start servers after format'
Status is 'In Review'
Labels: 'ci_2.8_daily,daily_test'
https://daosio.atlassian.net/browse/DAOS-18860

kjacque
kjacque previously approved these changes Jun 2, 2026
Copy link
Copy Markdown
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

Comment thread src/control/server/server_utils.go Outdated
srv.log.Debugf("Per-engine MemSize:%dMB, HugepageSize:%dMB (meminfo: %s)", memSizeMiB,
pageSizeMiB, smi.Summary())
srv.log.Debugf("Engine %d Per-engine MemSize:%dMiB, HugepageSize:%dMiB (meminfo: %s)",
ei.Index(), memSizeMiB, pageSizeMiB, smi.Summary())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use eIdx here too? I assume the other one was changed because we're no longer under the lock here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tanabarr added 2 commits June 3, 2026 11:43
…ugepages != calc(tgt_count)

Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18407/4/testReport/

@tanabarr tanabarr requested review from NiuYawei, kjacque and knard38 June 4, 2026 11:01
@tanabarr tanabarr added control-plane work on the management infrastructure of the DAOS Control Plane forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tanabarr tanabarr requested a review from a team June 4, 2026 13:40
@tanabarr
Copy link
Copy Markdown
Contributor Author

tanabarr commented Jun 4, 2026

expected dfuse and NLT issues but everything else passing

@daltonbohning daltonbohning merged commit 174dead into master Jun 4, 2026
44 of 48 checks passed
@daltonbohning daltonbohning deleted the tanabarr/control-engine-memsize-mdonssd-fix branch June 4, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

control-plane work on the management infrastructure of the DAOS Control Plane forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed.

Development

Successfully merging this pull request may close these issues.

6 participants