Skip to content

Ensure CnsVolumeOperationRequest CR reservations are set to 0 on error#3944

Open
deepakkinni wants to merge 1 commit intokubernetes-sigs:masterfrom
deepakkinni:topic/dk016388/reduce_res_err_v1
Open

Ensure CnsVolumeOperationRequest CR reservations are set to 0 on error#3944
deepakkinni wants to merge 1 commit intokubernetes-sigs:masterfrom
deepakkinni:topic/dk016388/reduce_res_err_v1

Conversation

@deepakkinni
Copy link
Copy Markdown
Collaborator

@deepakkinni deepakkinni commented Mar 31, 2026

What this PR does / why we need it:
Fixes CnsVolumeOperationRequest CRs getting stuck in InProgress state, preventing cleanup and causing temporary quota reservation spike. This issue occurred when CNS operations failed after task creation but before status updates could be persisted due to context expiration or session timeouts.

Changes include:

  • Force transition of stale InProgress tasks (>48h) to Error state in cleanup routine
  • Use fresh contexts for deferred status persistence to prevent silent failures
  • Fix 15+ error paths that returned without marking operations as Error
  • Refine quota reservation logic to retain capacity on first attempt errors, release on retry errors/success
  • Extract common defer logic into persistVolumeOperationDetails helper method
  • Add IsRetryAttempt helper for consistent retry detection
  • Comprehensive unit test coverage (12 new test suites, 28 new test cases)
  • Optimize unit tests with helper functions reducing code duplication by 60%

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:

  • All existing tests pass (43/43)
  • New unit tests cover stale InProgress handling, error path transitions, quota reservation logic
  • Tests validate both CSI transaction enabled/disabled scenarios
  • Code optimizations maintain identical functionality

Special notes for your reviewer:
Changes affect both transaction-enabled and improved idempotency code paths. The quota reservation refinement ensures capacity is retained for retries while preventing permanent leaks.

Release note:

Fix CnsVolumeOperationRequest cleanup issues causing stuck InProgress tasks and quota reservation leaks

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 31, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepakkinni

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 31, 2026
@deepakkinni deepakkinni force-pushed the topic/dk016388/reduce_res_err_v1 branch 2 times, most recently from e9b684f to e9c8a64 Compare March 31, 2026 07:38
Signed-off-by: Deepak Kinni <deepak.kinni@broadcom.com>
@deepakkinni deepakkinni force-pushed the topic/dk016388/reduce_res_err_v1 branch from e9c8a64 to 49b74e3 Compare March 31, 2026 07:44
@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #1147

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #1147

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #1158

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #969

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #1158

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #1166

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #972

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #1167

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #972

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #973

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #1169

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #1169

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #1170

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #974

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #975

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #1170

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

SUCCESS --- Jenkins Build #975

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #976

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #977

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #977

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #978

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

FAILED --- Jenkins Build #978

@deepakkinni
Copy link
Copy Markdown
Collaborator Author

SUCCESS --- Jenkins Build #979

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants