Skip to content

fix(backend): keep manifest cleanup on soft delete#2263

Open
riderx wants to merge 1 commit into
mainfrom
codex/manifest-soft-delete-cleanup
Open

fix(backend): keep manifest cleanup on soft delete#2263
riderx wants to merge 1 commit into
mainfrom
codex/manifest-soft-delete-cleanup

Conversation

@riderx
Copy link
Copy Markdown
Member

@riderx riderx commented May 13, 2026

Summary (AI generated)

  • Keep soft-delete manifest cleanup running even when bundle deletion needs a retry or app_versions_meta is missing.
  • Keep shared manifest storage safe by checking the exact s3_path before deleting a manifest object from storage.
  • Add an admin dry-run/apply script to clean inactive manifest rows and matching R2 objects only when no active version references the same s3_path.
  • Align stale file-read tests with the current no-database-read file availability behavior from main.

Motivation (AI generated)

A soft-deleted bundle could leave manifest rows behind when earlier cleanup steps returned or threw before deleteManifest ran. The cleanup script gives us a controlled way to audit and remove already-stuck inactive manifest rows without changing retention policy or hard-deleting app_versions.

Business Impact (AI generated)

This helps keep public.manifest size under control, reduces wasted Supabase/R2 storage, and avoids deleting files still used by active app versions. It also preserves update availability by not introducing a database read into file downloads.

Test Plan (AI generated)

  • bunx vitest run tests/on-version-update-cleanup.unit.test.ts
  • bunx vitest run tests/on-version-update-cleanup.unit.test.ts tests/files-app-read-guard.unit.test.ts tests/files-local-read-proxy.unit.test.ts tests/files-security.test.ts
  • bunx eslint --no-ignore scripts/cleanup_inactive_manifest.ts
  • bun lint:backend
  • bunx typos package.json scripts/cleanup_inactive_manifest.ts supabase/functions/_backend/triggers/on_version_update.ts tests/on-version-update-cleanup.unit.test.ts tests/files-app-read-guard.unit.test.ts tests/files-local-read-proxy.unit.test.ts tests/files-security.test.ts
  • bun run supabase:with-env -- bun scripts/cleanup_inactive_manifest.ts --batch-size=1 --max-batches=1
  • bun test:backend

Summary by CodeRabbit

  • New Features

    • Added administrative cleanup tool to remove inactive manifest entries and unreferenced S3 objects.
  • Bug Fixes

    • Improved S3 asset deletion logic to prevent accidental cleanup of shared files.
    • Enhanced error handling during manifest cleanup to defer and report failures.
  • Tests

    • Updated test expectations for file caching and availability behavior.

Review Change Stack

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 595c24bb-38b9-41b2-8364-691f8f3c9584

📥 Commits

Reviewing files that changed from the base of the PR and between 1c0bc76 and f01ac91.

📒 Files selected for processing (7)
  • package.json
  • scripts/cleanup_inactive_manifest.ts
  • supabase/functions/_backend/triggers/on_version_update.ts
  • tests/files-app-read-guard.unit.test.ts
  • tests/files-local-read-proxy.unit.test.ts
  • tests/files-security.test.ts
  • tests/on-version-update-cleanup.unit.test.ts

📝 Walkthrough

Walkthrough

This PR introduces a batched cleanup tool and improves manifest deletion logic to safely remove inactive manifest entries from the database and S3/R2. The trigger function gains safer reference checking and deferred error handling; tests verify cached file availability after app deletion; and a new admin script automates cleanup with batching, counter updates, and detailed reporting.

Changes

Manifest Cleanup and Availability

Layer / File(s) Summary
Trigger manifest deletion improvements
supabase/functions/_backend/triggers/on_version_update.ts
deleteManifest adds s3_path equality validation to prevent incorrect S3 cleanup when rows match on file_name/file_hash but not path. deleteIt defers S3 deletion and version-meta update errors until after deleteManifest completes, ensuring cleanup proceeds even if intermediate steps fail.
Cleanup script implementation and registration
package.json, scripts/cleanup_inactive_manifest.ts
New admin script audits inactive manifest rows (attached to deleted app_versions, not referenced by active versions), batch-deletes them from the database in transactions, refreshes per-app/per-version manifest counters, and deletes unreferenced S3/R2 objects. Supports --apply vs dry-run, --db-url override, --env-file, and configurable batching/concurrency; reports pre/post cleanup summaries and failed R2 deletions.
File read availability semantics tests
tests/files-app-read-guard.unit.test.ts, tests/files-local-read-proxy.unit.test.ts, tests/files-security.test.ts
Test suite renames and assertions updated to confirm cached files return HTTP 200 and original content after app deletion (not 404), and that cached reads bypass database lookups and do not write to cache.
Cleanup trigger unit tests
tests/on-version-update-cleanup.unit.test.ts
Manifest mocks and state expanded; new tests verify manifest rows delete even when version-metadata lookups fail, cleanup proceeds when R2 deletion fails, and shared manifest files are retained when other versions still reference them.

Sequence Diagram(s)

sequenceDiagram
  participant Admin as Admin CLI
  participant DB as Postgres
  participant R2 as R2/S3 Storage
  Admin->>DB: Query inactive manifest candidates<br/>(deleted versions, not active-referenced)
  DB-->>Admin: Candidate paths
  Admin->>Admin: Print pre-cleanup summary
  alt Apply mode
    loop For each batch
      Admin->>DB: Delete manifest rows in transaction
      DB-->>Admin: Deleted IDs and paths
      Admin->>DB: Refresh manifest_count per version
      Admin->>DB: Refresh manifest_bundle_count per app
      Admin->>DB: Query active references for deleted paths
      DB-->>Admin: Active reference counts
      Admin->>R2: Delete unreferenced objects (concurrently)
      R2-->>Admin: Success/failure per object
      Admin->>Admin: Log per-batch totals
    end
  end
  Admin->>DB: Query final manifest/app counts
  DB-->>Admin: Final summary
  Admin->>Admin: Print post-cleanup summary
  Admin->>DB: Close connection
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Manifest tales, once lost to time,
Now cleaned and pruned in perfect rhyme,
S3 paths match true, no false alarms,
Cached files bloom with full charms,
Errors deferred—the cleanup prevails!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(backend): keep manifest cleanup on soft delete' directly and clearly summarizes the main objective of the changeset—ensuring manifest cleanup continues during soft deletes.
Description check ✅ Passed The PR description includes a well-structured Summary, Motivation, Business Impact, and a comprehensive Test Plan section demonstrating thorough testing across multiple test suites and validation steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/manifest-soft-delete-cleanup

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link
Copy Markdown

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 13, 2026

Merging this PR will not alter performance

✅ 43 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing codex/manifest-soft-delete-cleanup (f01ac91) with main (1c0bc76)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@digzrow-coder
Copy link
Copy Markdown

I think the cleanup ordering here can permanently strand R2 objects on a transient delete failure.

Both deleteManifest() and the new admin:cleanup-inactive-manifest path delete the manifest DB rows before deleting the corresponding R2 keys. In the admin script, deleteManifestRows() commits the row deletion and then deleteR2Objects() runs; if any DeleteObjectCommand fails, the script only increments totalR2Failed. On the next run, those paths are no longer returned by fetchCandidatePaths() because the manifest rows are gone, so there is no database reference left to retry the failed object delete. The trigger path has the same shape: it deletes each manifest row first, then calls s3.deleteObject; a failed object delete is logged but the row has already been removed.

For production cleanup, I would avoid dropping the retry source before the external delete succeeds. One option is to select candidate rows/paths, delete R2 first for unreferenced paths, and only delete the manifest rows for paths whose object deletion succeeded, with a final active-reference check before the DB delete. At minimum this needs a regression where the first R2 delete fails and a second cleanup run can still find and retry the same path.

Copy link
Copy Markdown

@KCDaemon KCDaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rechecked the latest head (f01ac91) and the object-lifecycle ordering issue is still present in both cleanup paths.

In scripts/cleanup_inactive_manifest.ts, runApply() calls deleteManifestRows() first; that function deletes public.manifest rows inside a committed transaction and only afterwards does deleteR2Objects() attempt DeleteObjectCommand for the returned paths. If any R2 delete fails, the script only increments totalR2Failed / prints the failed paths. On the next run, those paths are no longer returned by fetchCandidatePaths() because the manifest rows are gone, so there is no durable retry source left for the stranded objects.

The trigger path in deleteManifest() has the same shape for each manifest entry: delete the manifest DB row, then check for remaining references, then call s3.deleteObject(). A transient object-delete failure leaves the object behind after the DB row has already been removed.

git diff --check origin/main...origin/pr-2263 passes locally, but CI currently has failing Run tests jobs. I would keep this blocked until object deletion happens before dropping the last manifest reference, or failed object deletes keep durable retry state. A regression where the first R2 delete fails and a second cleanup run can still find/retry the same path would cover the production failure mode.

@digzrow-coder
Copy link
Copy Markdown

I put together a helper branch that fixes the lifecycle ordering blocker and the current typo CI failure:

https://github.com/digzrow-coder/capgo/tree/codex/capgo-pr2263-helper-forkbase

What it changes:

  • scripts/cleanup_inactive_manifest.ts now deletes R2 objects before deleting the matching public.manifest rows, and only deletes DB rows for paths whose R2 deletion succeeded. Failed R2 deletes leave the manifest rows in place so the next run can find/retry them.
  • deleteManifest() in on_version_update.ts now checks for other references first; if this is the last reference, it deletes the R2 object before deleting the manifest row. If the object delete fails, the row is kept as retry state.
  • Fixes the typos failures in supabase/functions/_backend/files/files.ts.
  • Adds regressions for both retry paths, including an admin-script test where the first R2 delete fails and the second run can still retry/delete that path.

Validated locally:

  • bunx vitest run tests/cleanup-inactive-manifest.unit.test.ts tests/on-version-update-cleanup.unit.test.ts tests/files-app-read-guard.unit.test.ts tests/files-local-read-proxy.unit.test.ts
  • bunx eslint --no-ignore scripts/cleanup_inactive_manifest.ts tests/cleanup-inactive-manifest.unit.test.ts supabase/functions/_backend/triggers/on_version_update.ts supabase/functions/_backend/files/files.ts tests/on-version-update-cleanup.unit.test.ts
  • bun run lint:backend
  • bun run typecheck
  • typos on the changed files
  • git diff --check

I could not run the service-backed tests/files-security.test.ts locally because Postgres/Supabase was not listening on 127.0.0.1:54322; the non-service unit tests above pass.

@digzrow-coder
Copy link
Copy Markdown

Follow-up: the fork CI rerun for the helper branch is green now: https://github.com/digzrow-coder/capgo/actions/runs/25904826715

@digzrow-coder
Copy link
Copy Markdown

I rebased the manifest cleanup retryability fix onto current main because this PR is currently conflicting and the same cleanup-order bug is still present there.

Clean branch/commit:

What changed:

  • Manifest R2 objects are deleted before their durable manifest rows are removed when the deleted version is the last reference.
  • If manifest object deletion fails, the row and manifest counters are kept so the trigger can retry instead of stranding storage state.
  • Bundle deletion failures no longer prevent manifest cleanup from being attempted, but the trigger still throws afterward so the deleted version remains retryable.
  • Same-object cleanup is serialized with a transaction-scoped advisory lock to preserve the shared-file race protection.

Validation on the clean branch:

  • bunx vitest run tests/on-version-update-cleanup.unit.test.ts
  • bunx eslint --no-ignore supabase/functions/_backend/triggers/on_version_update.ts tests/on-version-update-cleanup.unit.test.ts
  • bun run lint:backend
  • bun run typecheck
  • git diff --check (only Git CRLF conversion warnings)

I tried opening this as a fresh PR, but GitHub shows: "An owner of this repository has limited the ability to open a pull request to users that are collaborators on this repository." This branch is ready to cherry-pick or use as the replacement for the conflicting cleanup slice.

@digzrow-coder
Copy link
Copy Markdown

Follow-up: fork CI for the clean current-main branch is green now.

Run: https://github.com/digzrow-coder/capgo/actions/runs/25919249303

Completed successfully:

  • Run tests
  • Run Playwright tests
  • Check dead code
  • path/change detection jobs

This is the branch from my previous comment: https://github.com/digzrow-coder/capgo/tree/codex/manifest-cleanup-retry-current

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants