fix(backend): keep manifest cleanup on soft delete#2263
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR introduces a batched cleanup tool and improves manifest deletion logic to safely remove inactive manifest entries from the database and S3/R2. The trigger function gains safer reference checking and deferred error handling; tests verify cached file availability after app deletion; and a new admin script automates cleanup with batching, counter updates, and detailed reporting. ChangesManifest Cleanup and Availability
Sequence Diagram(s)sequenceDiagram
participant Admin as Admin CLI
participant DB as Postgres
participant R2 as R2/S3 Storage
Admin->>DB: Query inactive manifest candidates<br/>(deleted versions, not active-referenced)
DB-->>Admin: Candidate paths
Admin->>Admin: Print pre-cleanup summary
alt Apply mode
loop For each batch
Admin->>DB: Delete manifest rows in transaction
DB-->>Admin: Deleted IDs and paths
Admin->>DB: Refresh manifest_count per version
Admin->>DB: Refresh manifest_bundle_count per app
Admin->>DB: Query active references for deleted paths
DB-->>Admin: Active reference counts
Admin->>R2: Delete unreferenced objects (concurrently)
R2-->>Admin: Success/failure per object
Admin->>Admin: Log per-batch totals
end
end
Admin->>DB: Query final manifest/app counts
DB-->>Admin: Final summary
Admin->>Admin: Print post-cleanup summary
Admin->>DB: Close connection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub. |
|
Merging this PR will not alter performance
Comparing Footnotes
|
|
I think the cleanup ordering here can permanently strand R2 objects on a transient delete failure. Both For production cleanup, I would avoid dropping the retry source before the external delete succeeds. One option is to select candidate rows/paths, delete R2 first for unreferenced paths, and only delete the manifest rows for paths whose object deletion succeeded, with a final active-reference check before the DB delete. At minimum this needs a regression where the first R2 delete fails and a second cleanup run can still find and retry the same path. |
KCDaemon
left a comment
There was a problem hiding this comment.
Rechecked the latest head (f01ac91) and the object-lifecycle ordering issue is still present in both cleanup paths.
In scripts/cleanup_inactive_manifest.ts, runApply() calls deleteManifestRows() first; that function deletes public.manifest rows inside a committed transaction and only afterwards does deleteR2Objects() attempt DeleteObjectCommand for the returned paths. If any R2 delete fails, the script only increments totalR2Failed / prints the failed paths. On the next run, those paths are no longer returned by fetchCandidatePaths() because the manifest rows are gone, so there is no durable retry source left for the stranded objects.
The trigger path in deleteManifest() has the same shape for each manifest entry: delete the manifest DB row, then check for remaining references, then call s3.deleteObject(). A transient object-delete failure leaves the object behind after the DB row has already been removed.
git diff --check origin/main...origin/pr-2263 passes locally, but CI currently has failing Run tests jobs. I would keep this blocked until object deletion happens before dropping the last manifest reference, or failed object deletes keep durable retry state. A regression where the first R2 delete fails and a second cleanup run can still find/retry the same path would cover the production failure mode.
|
I put together a helper branch that fixes the lifecycle ordering blocker and the current typo CI failure: https://github.com/digzrow-coder/capgo/tree/codex/capgo-pr2263-helper-forkbase What it changes:
Validated locally:
I could not run the service-backed |
|
Follow-up: the fork CI rerun for the helper branch is green now: https://github.com/digzrow-coder/capgo/actions/runs/25904826715 |
|
I rebased the manifest cleanup retryability fix onto current Clean branch/commit:
What changed:
Validation on the clean branch:
I tried opening this as a fresh PR, but GitHub shows: "An owner of this repository has limited the ability to open a pull request to users that are collaborators on this repository." This branch is ready to cherry-pick or use as the replacement for the conflicting cleanup slice. |
|
Follow-up: fork CI for the clean current-main branch is green now. Run: https://github.com/digzrow-coder/capgo/actions/runs/25919249303 Completed successfully:
This is the branch from my previous comment: https://github.com/digzrow-coder/capgo/tree/codex/manifest-cleanup-retry-current |



Summary (AI generated)
Motivation (AI generated)
A soft-deleted bundle could leave manifest rows behind when earlier cleanup steps returned or threw before deleteManifest ran. The cleanup script gives us a controlled way to audit and remove already-stuck inactive manifest rows without changing retention policy or hard-deleting app_versions.
Business Impact (AI generated)
This helps keep public.manifest size under control, reduces wasted Supabase/R2 storage, and avoids deleting files still used by active app versions. It also preserves update availability by not introducing a database read into file downloads.
Test Plan (AI generated)
Summary by CodeRabbit
New Features
Bug Fixes
Tests