Skip to content

(WIP) OCPBUGS-86181#6107

Draft
isabella-janssen wants to merge 1 commit into
openshift:mainfrom
isabella-janssen:ocpbugs-86181
Draft

(WIP) OCPBUGS-86181#6107
isabella-janssen wants to merge 1 commit into
openshift:mainfrom
isabella-janssen:ocpbugs-86181

Conversation

@isabella-janssen
Copy link
Copy Markdown
Member

@isabella-janssen isabella-janssen commented May 29, 2026

Closes: OCPBUGS-86181

- What I did
This primarily updates the MCN property and condition transition tests run against the default master MCP instead of a custom MCP when running on a SNO cluster. Some timeouts were increased for the SNO case and the cleanup ordering of the image rebuild required test was updated to decrease the number of image builds.

- How to verify it
All MCN property and transition tests for the ImageModeStatusReporting feature gate should pass in the SNO test suite and continue passing in the default cluster suites.

- Description for the changelog
OCPBUGS-86181: Update ImageModeStatusReporting MCN property and condition transition tests to be resilient on SNO topology

Summary by CodeRabbit

  • Tests

    • Leader-election checks switched to polling with extended wait for stability.
    • Image-mode and machine-config transition tests are SNO-aware, tolerate transient connection errors, use adaptive retry logic, and fall back to the master pool when workers are absent.
    • Added rebootless update test flows and more resilient MCN transition validation.
  • New Files

    • Added a test machine configuration enabling the USB Guard extension.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR makes MachineConfig tests SNO-aware, adds a default-MCP test path when worker MCPs lack nodes, hardens machine-count checks against transient connection failures, adds a master MachineConfig fixture, and changes MOSC leader-election log checks to poll until observed.

Changes

SNO and Image Mode Test Reliability

Layer / File(s) Summary
SNO Detection Infrastructure and Function Signature Updates
test/extended/machineconfignode.go, test/extended/image_mode_status_reporting.go
Added errors and syscall imports for connection-error classification. Updated ValidateTransitionThroughConditions signature to accept oc *exutil.CLI, enabling SNO detection inside transition validation.
SNO-Aware Transition Validation Logic
test/extended/machineconfignode.go
Implemented SNO detection and adjusted timing windows for SNO clusters. Modified transition checks to skip cordoned/drained validation for SNO, tolerate ECONNREFUSED as warnings during AppliedFiles/UpdateExecuted waits, and gate UpdateComplete/Uncordoned checks to non-SNO, non-rebootless updates.
Default MCP Fallback and Image-Mode MCN Tests
test/extended/image_mode_status_reporting.go, test/extended-priv/testdata/files/machineconfigs/master-extension-mc.yaml
Added runImageModeMCNTestDefaultMCP to run tests against a default MCP (master) when worker has no nodes. Included master-extension-mc.yaml fixture and updated tests to branch between master and custom MCPs.
Connection-Error Resilience in Machine Count Validation
test/extended/image_mode_status_reporting.go
Reworked validateMCPMachineCountTransitions with dynamic retry counts for layered/SNO updates. Changed mcnAndNodeAnnotationMachineCountsMatch to return (countsMatch, isConnErr) and treat ECONNREFUSED retrieval errors as retryable.
MOSC Leader-Election Log Polling
test/extended-priv/mco_ocb.go
Replaced immediate log assertion with Eventually polling that repeatedly checks mOSBuilder.Logs() for leader-election messages (up to 5 minutes, 10s interval).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

jira/valid-reference

Suggested reviewers

  • dkhater-redhat
🚥 Pre-merge checks | ✅ 4 | ❌ 11

❌ Failed checks (1 warning, 10 inconclusive)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title '(WIP) OCPBUGS-86181' only contains a work-in-progress marker and a Jira ticket reference without describing the actual changes made to the codebase. Replace with a descriptive title that summarizes the main changes, such as 'Add SNO-aware transition validation and image mode test improvements for machine config updates' or similar.
Stable And Deterministic Test Names ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Test Structure And Quality ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Microshift Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Single Node Openshift (Sno) Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Topology-Aware Scheduling Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ote Binary Stdout Contract ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ipv6 And Disconnected Network Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Weak-Crypto ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Container-Privileges ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Sensitive-Data-In-Logs ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/178b0a90-5b9c-11f1-8b57-67f8e83fcc97-0

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended-priv/mco_ocb.go`:
- Around line 314-317: The callback passed to o.Eventually currently discards
the error from mOSBuilder.Logs(); change it to capture the error (logs, err :=
mOSBuilder.Logs()) and handle it instead of using the blank identifier: if err
!= nil, return a descriptive string containing the error (or otherwise surface
the error to the test/failer) so the Eventually assertion sees failures to fetch
logs; update the callback around mOSBuilder.Logs() used inside o.Eventually(...)
to propagate or report the error rather than ignoring it.

In `@test/extended/image_mode_status_reporting.go`:
- Around line 394-409: In validateMCPMachineCountTransitions, stop ignoring the
error from extpriv.IsSNOSafe(oc): capture the returned error, log or return it
(or set a safe default for loopCount) and decide fallback behavior when
IsSNOSafe fails; also ensure the SNO retry-budget bump (loopCount = 10) is
applied where intended—either move the IsSNOSafe check out of the layered-only
block or duplicate its handling for the non-layered path so SNO clusters get the
higher retry count in both cases if that’s the desired behavior.

In `@test/extended/machineconfignode.go`:
- Around line 302-315: The code currently skips asserting UpdateExecuted=True
unconditionally; restore a failing assertion for non-SNO/non-connection-error
paths: after calling waitForMCNConditionStatus (function
waitForMCNConditionStatus, vars conditionMet and err, checking isSNO and
errors.Is(err, syscall.ECONNREFUSED)), keep the special-case that logs and
tolerates the ECONNREFUSED for SNO, but in the else branch ensure you both
assert no error (o.Expect(err).NotTo(HaveOccurred())) and assert conditionMet is
true (o.Expect(conditionMet).To(o.BeTrue(), fmt.Sprintf("Error, could not detect
%s on node %s: %v", mcfgv1.MachineConfigNodeUpdateExecuted, updatingNodeName,
err))). This guarantees non-SNO and non-conn-error runs fail if UpdateExecuted
was not observed.
- Around line 208-216: The call to extpriv.IsSNOSafe(oc) currently discards its
error (isSNO, _ := extpriv.IsSNOSafe(oc)); change it to capture the error and
handle it explicitly (e.g., isSNO, err := extpriv.IsSNOSafe(oc); if err != nil {
/* fail the test or log and return */ }) so that SNO detection failures do not
silently fall back to non-SNO timing; update the surrounding logic that uses
isSNO, and ensure any failure uses the test framework's failure/logging
mechanism (e.g., t.Fatalf, framework.Failf or klog/Logf as appropriate) instead
of ignoring the error while still setting updatingWaitTime and
updatingWaitInterval.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 08da7797-b463-4018-8e76-ae6f432da12b

📥 Commits

Reviewing files that changed from the base of the PR and between d72b715 and 5614069.

📒 Files selected for processing (4)
  • test/extended-priv/mco_ocb.go
  • test/extended-priv/testdata/files/machineconfigs/master-extension-mc.yaml
  • test/extended/image_mode_status_reporting.go
  • test/extended/machineconfignode.go

Comment thread test/extended-priv/mco_ocb.go Outdated
Comment thread test/extended/image_mode_status_reporting.go
Comment thread test/extended/machineconfignode.go Outdated
Comment thread test/extended/machineconfignode.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/machineconfignode.go`:
- Around line 292-301: The SNO branch current logic swallows non-ECONNREFUSED
errors (variables: isSNO, err, syscall.ECONNREFUSED) and only logs on
ECONNREFUSED or missing condition, causing inconsistent behavior with the
non-SNO path; update the isSNO handling in the function that waits for
AppliedFiles=True so that if err != nil and not errors.Is(err,
syscall.ECONNREFUSED) you call the same assertion used in the non-SNO path
(o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf(...))) so non-ECONNREFUSED
errors fail the test, while preserving the existing special-case log for
ECONNREFUSED and the conditionMet check (logger.Infof and
o.Expect(conditionMet).To(o.BeTrue())).
- Around line 310-319: The SNO branch in the UpdateExecuted check currently
swallows non-ECONNREFUSED errors; update the isSNO handling in the same block
that checks err and conditionMet so that if err is non-nil and not
syscall.ECONNREFUSED you log the error and/or fail the test similarly to the
non-SNO branch (use logger.Errorf or o.Expect to surface the error), and if
conditionMet is false ensure you log or fail (use logger.Infof + o.Expect or
o.Expect(conditionMet).To(o.BeTrue(...))) so behavior matches the non-SNO path;
reference variables/function names: isSNO, err, syscall.ECONNREFUSED,
conditionMet, logger, and o.Expect/BeTrue.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b795a6e6-b331-41eb-898e-3110a36678bb

📥 Commits

Reviewing files that changed from the base of the PR and between de9e297 and a1d9689.

📒 Files selected for processing (3)
  • test/extended-priv/mco_ocb.go
  • test/extended/image_mode_status_reporting.go
  • test/extended/machineconfignode.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/extended-priv/mco_ocb.go
  • test/extended/image_mode_status_reporting.go

Comment thread test/extended/machineconfignode.go Outdated
Comment thread test/extended/machineconfignode.go Outdated
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 2, 2026
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 4, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4f0ef2c0-603c-11f1-97d4-770ff5b6a596-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 4, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/48ff8260-6054-11f1-931e-e338040069af-0

@isabella-janssen isabella-janssen force-pushed the ocpbugs-86181 branch 2 times, most recently from 25c491d to 3eaa05f Compare June 5, 2026 15:17
@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/93f06240-60f3-11f1-9f6f-88cb747357c8-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2c4f6ed0-6105-11f1-820a-cab7dccb558e-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9f119130-6107-11f1-9c1a-f7f378e83c7c-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant