PoC: feat/agent-e2e-verify by CarmenDou · Pull Request #170 · InsForge/CLI

CarmenDou · 2026-06-18T19:43:26Z

Summary by cubic

Adds hidden verify probes and auto-setup for the @playwright/mcp browser server to power light‑mode agent E2E verification with analytics-backed findings. Also excludes Playwright artifacts from deployment uploads.

New Features
- Hidden verify CLI with rls, truth, and finding; each emits cli_verify_finding and sets pass/fail exit codes.
- Auto-configures the @playwright/mcp server (npx @playwright/mcp@latest --headless) for Claude Code, Cursor, and Codex.
- Deterministic probe helpers with unit tests, plus command tests for rls/truth.
Bug Fixes
- Exclude Playwright artifacts from deploy uploads: test-results, playwright-report, .playwright-mcp.
- Harden verify truth: block destructive DML/DDL (incl. CTE/MERGE and SELECT INTO), prevent statement chaining, treat non‑2xx as errors; enforce exclusive --expect vs --expect-count; validate --expect-count before executing to avoid DB calls on invalid input.
- Improve safety and accuracy: drop all free‑text from telemetry (no endpoint/message; also strip db_actual/ui_claimed), scope the anonymous RLS control to the owner filter, validate identifiers/emails, and improve Windows agent detection for MCP setup.

^{Written for commit 0bdb097. Summary will update on new commits.}

Note

Add hidden `verify` CLI commands for RLS isolation and backend truth probes

Adds a hidden verify command group with three subcommands: verify rls, verify truth, and verify finding, registered in src/index.ts.
verify rls probes row-level security by performing cross-user and anonymous PostgREST reads, classifying results as rls_leak, rls_overrestrict, or pass, and setting a non-zero exit code on failure.
verify truth runs a read-only SQL query and compares the result to a UI-claimed value or row count, recording a false_pass finding on mismatch.
verify finding records an arbitrary loud error (4xx/5xx, console error) as an analytics finding.
All findings are emitted via trackVerifyFinding in src/lib/analytics.ts, which sanitizes endpoints and messages and strips sensitive evidence fields before sending.
During installSkills, the Playwright browser MCP is now configured for supported agents (Claude Code, Cursor, Codex) and users see a restart notice.
Behavioral Change: installSkills now accepts an optional third parameter withBrowserMcp (default true) and may write to agent config files on disk during skill installation.

Changes since #170 opened

Enhanced telemetry tracking in trackVerifyFinding to include project and organization context [a309bd5]
Added validation for --expect-count CLI flag to enforce non-negative integer values [a309bd5]
Exported analytics sanitization utilities and added comprehensive unit tests [a309bd5]
Fixed anonymous control probe in verify.registerVerifyRlsCommand to use the same owner-scoped filter string when calling recordsCount instead of passing undefined [63b3147]
Removed endpoint and message properties from analytics.trackVerifyFinding event payload and deleted sanitizeEndpoint and sanitizeMessage functions [63b3147]
Added Vitest test suites for verify rls and verify truth commands with comprehensive test coverage [63b3147]
Updated error message in verify.registerVerifyTruthCommand to clarify that the guard blocks common destructive query forms but does not provide hard read-only guarantees [63b3147]
Expanded documentation comments for verify-probe.isReadOnlyQuery helper to clarify limitations of the read-only guard [63b3147]
Added pre-I/O validation to the registerVerifyTruthCommand action handler requiring either --expect or --expect-count flags and validating that --expect-count is a non-negative integer before executing database queries [0bdb097]
Updated the test for rejecting non-integer --expect-count values to assert that runRawSql is not invoked when validation fails [0bdb097]

^{Macroscope summarized 008fdd5.}

Summary by CodeRabbit

Release Notes

New Features
- Added experimental (hidden) “verify” probe commands for RLS isolation and backend truth checks, plus recording of verification findings with analytics.
- Added deterministic browser MCP configuration for supported assistants, enabled during skill installation by default.
Bug Fixes
- Improved deployment packaging to exclude additional test/report directories from uploads.
Tests
- Added coverage for browser MCP configuration and verify-probe/classification logic and CLI behaviors.
Chores
- Expanded deployment exclusion patterns to reduce unnecessary artifacts in uploaded bundles.

…e2e environments

…fe names, resilient teardown

… keys, assert hidden via help

…tes, tolerate 404 on teardown, drop name from analytics

…validation, type create opts

…sforge-verify at link time

…eview-poc # Conflicts: # src/commands/projects/link.ts

…decisions, gotchas

…aywright-report, .playwright-mcp) from deploy uploads

…view commands

…tallSkills call

…fix finding props

…uery

coderabbitai · 2026-06-18T19:43:35Z

Walkthrough

Adds an experimental verify CLI command group with three subcommands (rls, truth, finding) backed by new probe classification logic, fetch helpers, and PostHog analytics tracking. Separately adds browser MCP auto-configuration for Claude Code, Cursor, and Codex agents integrated into installSkills, with Playwright-generated directories excluded from deployments.

Changes

Verify Probe Commands

Layer / File(s)	Summary
Probe classification logic and input validators `src/lib/verify-probe.ts`	Defines `RlsFindingType` and `TruthFindingType` unions, implements `classifyRls` (leak/overrestrict/none based on row counts with evidence) and `classifyTruth` (normalized scalar comparison), and adds `isReadOnlyQuery`, `isSafeIdentifier`, and `isLikelyEmail` heuristics.
Fetch helpers and API calls `src/lib/verify-probe.ts`	Implements token/row extraction, `assertOk` for non-2xx errors, `login`, and `recordsCount` (treating 401/403 as zero).
VerifyFinding analytics interface and tracker `src/lib/analytics.ts`	Exports `VerifyFinding` interface and `trackVerifyFinding`, which filters sensitive `db_actual`/`ui_claimed` evidence keys before emitting a `cli_verify_finding` PostHog event.
Probe unit tests `src/lib/verify-probe.test.ts`	Vitest suite for `classifyRls` priority/verdict, `classifyTruth` normalization and null/undefined, `isReadOnlyQuery` CTE/chaining rejection, `isSafeIdentifier`, and `isLikelyEmail` SQL-injection rejection.
RLS probe command `src/commands/verify/rls.ts`	Registers `verify rls`: validates table/owner/user options, authenticates as user A and B, fetches user A's `auth.users.id` via raw SQL, runs three `recordsCount` probes (B→A, A→self, anon), classifies result, reports finding, and sets `process.exitCode`.
RLS command tests `src/commands/verify/rls.test.ts`	Vitest tests validating input rejection (injection, non-email), probe scoping assertion for anonymous record counts with owner filter, and command wiring via mocked config/API/analytics.
Truth probe command `src/commands/verify/truth.ts`	Registers `verify truth`: enforces read-only SQL, executes via `runRawSql`, classifies scalar or count expectations, tracks analytics, and sets `process.exitCode` based on match/mismatch.
Truth command tests `src/commands/verify/truth.test.ts`	Vitest tests validating SQL query rejection, flag combination rejection, non-integer count rejection, match/mismatch exit codes, and analytics tracking on success.
Finding probe command `src/commands/verify/finding.ts`	Registers `verify finding`: accepts kind/type/status/endpoint/message options, builds finding payload, tracks via analytics, flushes, and outputs JSON or formatted confirmation.
CLI wiring `src/commands/verify/index.ts`, `src/index.ts`	Exports `registerVerifyCommands`, attaches rls/truth/finding to a hidden experimental `verify` commander subcommand, and registers it on the main CLI `program`.

Browser MCP Auto-configuration

Layer / File(s)	Summary
browser-mcp helpers and per-agent logic `src/lib/browser-mcp.ts`	Implements `mergeJsonMcp` (idempotent JSON merge with malformed-JSON fallback), `ensureCodexToml` (TOML block append), `commandExists` (platform-aware PATH check), and per-agent `apply` targets for Claude Code (via `claude` CLI), Cursor (`~/.cursor/mcp.json`), and Codex (`~/.codex/config.toml`). `configureBrowserMcp` runs all agents best-effort and returns configured labels.
browser-mcp tests `src/lib/browser-mcp.test.ts`	Tests `mergeJsonMcp` for file creation, merge, idempotency, malformed JSON recovery, `servers` key, and non-object normalization; tests `ensureCodexToml` for block appending, idempotency, and content preservation.
installSkills integration and deploy exclusions `src/lib/skills.ts`, `src/commands/deployments/deploy.ts`	Adds optional `withBrowserMcp` parameter to `installSkills` that conditionally invokes `configureBrowserMcp` with log feedback; extends `EXCLUDE_PATTERNS` to exclude `test-results`, `playwright-report`, and `.playwright-mcp` from deployments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

InsForge/CLI#119: Both PRs modify src/lib/skills.ts by changing the installSkills function signature/parameters (this PR adds withBrowserMcp, PR#119 adds authProvider) and thus affect the same skill-installation code path.

Suggested reviewers

tonychang04
jwfing

Poem

🐇 Hopping through the code at dawn,
New verify commands are drawn!
RLS probes sniff for leaks,
Truth checks what the database speaks.
Browser MCP auto-wires with flair —
This bunny finds no bugs in there! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'PoC: feat/agent-e2e-verify' is vague and uses non-descriptive technical jargon; it does not clearly convey what the changeset accomplishes to someone scanning commit history.	Use a clear, descriptive title like 'Add verify CLI command group for agent E2E testing with RLS and truth probes' or 'Introduce verify subcommands for backend verification and MCP browser tool setup'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-agent-e2e-preview-poc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

4 issues found across 12 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

greptile-apps · 2026-06-18T19:53:58Z

Greptile Summary

This PR introduces a set of hidden experimental verify CLI commands (rls, truth, finding) for agent-driven E2E backend probing, auto-configures the Playwright MCP during skill installation, and excludes Playwright artifacts from deployment uploads. It also addresses several security concerns raised in earlier review rounds (identifier sanitisation, anon-scoped filter, PII evidence stripping, Windows command -v fix, pre-I/O flag validation).

verify rls performs a three-probe RLS isolation check (B reads A's rows, A reads own rows, anonymous reads A's rows), all scoped to the same owner filter; identifiers are validated with isSafeIdentifier and emails with isLikelyEmail; row counts (not row data) reach PostHog.
verify truth runs an admin-key SQL read-only check; the isReadOnlyQuery guard blocks common DML/DDL statement forms (documented caveat: function-call mutations like SELECT setval(...) still pass the keyword scan); all flag validation runs before runRawSql.
verify finding records agent-observed loud errors (4xx/5xx, console errors) to PostHog; endpoint and message are correctly dropped from the telemetry payload, but --kind is a required free-text field with no format enforcement that does reach PostHog.

Confidence Score: 4/5

Safe to merge as an experimental hidden feature; the most sensitive paths (PII in telemetry, identifier injection, pre-I/O flag validation) have been addressed, with one residual gap in the finding recorder's free-text kind field.

The verify commands are hidden and experimental, and this round has closed several earlier gaps. The remaining open items are the documented isReadOnlyQuery limitation (function-call mutations bypass the keyword guard, acknowledged in code comments) and the unvalidated --kind field in verify finding that reaches PostHog as free text. Neither blocks the PoC intent, but the telemetry gap is inconsistent with the stated no-free-text-to-PostHog contract established elsewhere in the same function.

src/commands/verify/finding.ts (--kind free-text reaches PostHog); src/commands/verify/rls.ts (email still string-interpolated into admin-key SQL, mitigated by isLikelyEmail but not fully parameterised)

Important Files Changed

Filename	Overview
src/commands/verify/rls.ts	New RLS isolation probe; validates identifiers and emails before use, but email is still string-interpolated into a raw admin-key SQL query rather than passed as a parameterized value.
src/commands/verify/truth.ts	New backend-truth cross-check; all input validation (isReadOnlyQuery, mutual-exclusion, non-negative integer) now runs before runRawSql; acknowledged limitation that function-call mutations bypass the keyword guard.
src/commands/verify/finding.ts	New finding recorder; endpoint/message are correctly excluded from PostHog payload; kind is a free-text required flag that reaches PostHog without format validation.
src/lib/analytics.ts	Adds trackVerifyFinding; db_actual/ui_claimed are correctly stripped from PostHog; endpoint/message are excluded at the captureEvent call site; kind and table are transmitted as structured identifiers.
src/lib/verify-probe.ts	Pure classification helpers are well-tested; isReadOnlyQuery explicitly documents that SELECT setval(...)-style function mutations bypass the keyword guard; fetch wiring correctly treats 401/403 as 0 rows and throws on other non-2xx.
src/lib/browser-mcp.ts	Configures Playwright MCP for Claude Code, Cursor, and Codex; Windows now uses where instead of command -v; all shell arguments are hardcoded constants so no injection risk; best-effort per-agent error swallowing is appropriate.
src/lib/skills.ts	Adds optional withBrowserMcp parameter (default true) and calls configureBrowserMcp after the main skills installation block; errors are caught and surfaced as warnings in non-JSON mode.
src/commands/verify/index.ts	Thin orchestration file that registers verify subcommands under a hidden parent command; straightforward and correct.
src/index.ts	Registers the new hidden verify command group alongside existing commands; no issues.
src/commands/deployments/deploy.ts	Adds test-results, playwright-report, and .playwright-mcp to the deployment exclusion list; straightforward and correct.
src/lib/verify-probe.test.ts	Comprehensive unit tests for all pure helpers; covers rls_leak, rls_overrestrict, false_pass, normalisation, isReadOnlyQuery edge cases, and identifier/email validation.
src/lib/browser-mcp.test.ts	Tests mergeJsonMcp and ensureCodexToml with idempotency, malformed JSON, non-object section values, and existing content preservation.
src/commands/verify/rls.test.ts	Tests identifier injection rejection, non-email rejection, and verifies the anonymous control probe uses an owner-scoped filter.
src/commands/verify/truth.test.ts	Tests non-read query rejection, mutual-exclusion of flags, non-integer expect-count rejection (all before DB), and pass/fail exit codes.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent
    participant CLI as insforge CLI
    participant VProbe as verify-probe.ts
    participant OSS as PostgREST/Auth API
    participant PostHog

    Note over Agent,PostHog: verify rls
    Agent->>CLI: verify rls --table orders --owner user_id
    CLI->>CLI: validate identifiers and emails
    CLI->>OSS: login(userA) and login(userB)
    OSS-->>CLI: aToken, bToken
    CLI->>OSS: "runRawSql SELECT id FROM auth.users WHERE email=..."
    OSS-->>CLI: aId UUID
    CLI->>VProbe: recordsCount with bToken, aToken, undefined
    VProbe->>OSS: GET records with owner filter x3
    OSS-->>VProbe: row counts
    CLI->>VProbe: classifyRls
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding row counts only
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify truth
    Agent->>CLI: verify truth --query SELECT n --expect 3
    CLI->>CLI: isReadOnlyQuery and flag validation
    CLI->>OSS: runRawSql with admin key
    OSS-->>CLI: rows
    CLI->>VProbe: classifyTruth
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding without db_actual or ui_claimed
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify finding
    Agent->>CLI: verify finding --kind http_500 --status 500
    CLI->>PostHog: trackVerifyFinding kind and status only
    CLI-->>Agent: exit 0

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent
    participant CLI as insforge CLI
    participant VProbe as verify-probe.ts
    participant OSS as PostgREST/Auth API
    participant PostHog

    Note over Agent,PostHog: verify rls
    Agent->>CLI: verify rls --table orders --owner user_id
    CLI->>CLI: validate identifiers and emails
    CLI->>OSS: login(userA) and login(userB)
    OSS-->>CLI: aToken, bToken
    CLI->>OSS: "runRawSql SELECT id FROM auth.users WHERE email=..."
    OSS-->>CLI: aId UUID
    CLI->>VProbe: recordsCount with bToken, aToken, undefined
    VProbe->>OSS: GET records with owner filter x3
    OSS-->>VProbe: row counts
    CLI->>VProbe: classifyRls
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding row counts only
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify truth
    Agent->>CLI: verify truth --query SELECT n --expect 3
    CLI->>CLI: isReadOnlyQuery and flag validation
    CLI->>OSS: runRawSql with admin key
    OSS-->>CLI: rows
    CLI->>VProbe: classifyTruth
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding without db_actual or ui_claimed
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify finding
    Agent->>CLI: verify finding --kind http_500 --status 500
    CLI->>PostHog: trackVerifyFinding kind and status only
    CLI-->>Agent: exit 0

_{Reviews (6): Last reviewed commit: "fix(verify): validate --expect-count bef..." | Re-trigger Greptile}

…expect flags, Windows command lookup, drop PII from telemetry

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (1)

src/lib/verify-probe.test.ts (1)

52-78: ⚡ Quick win

Add a regression for SELECT ... INTO mutation.

The SQL safety tests cover DML-in-CTE and statement chaining, but not PostgreSQL’s table-creating SELECT ... INTO form. Add coverage alongside the read-only rejection cases so the admin-key guard cannot regress.

🧪 Suggested test additions

   it('rejects DML hidden inside a CTE (WITH … DELETE/UPDATE/INSERT/MERGE … SELECT)', () => {
     expect(isReadOnlyQuery('with x as (delete from users returning id) select id from x')).toBe(false);
     expect(isReadOnlyQuery('WITH x AS (UPDATE t SET c = 1 RETURNING id) SELECT * FROM x')).toBe(false);
     expect(isReadOnlyQuery('with x as (insert into t values (1) returning id) select id from x')).toBe(false);
     expect(isReadOnlyQuery('with x as (merge into t using s on t.id = s.id returning t.id) select id from x')).toBe(false);
   });
+
+  it('rejects SELECT INTO because it creates a table', () => {
+    expect(isReadOnlyQuery('select * into temp verify_probe_copy from users')).toBe(false);
+    expect(isReadOnlyQuery('with x as (select 1) select * into temp verify_probe_copy from x')).toBe(false);
+  });
 });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/verify-probe.test.ts` around lines 52 - 78, The isReadOnlyQuery test
suite does not have coverage for PostgreSQL's SELECT ... INTO statement form
which creates or modifies tables and should be rejected. Add a new test case
within the describe('isReadOnlyQuery') block, after the existing rejection
tests, that uses it() to test that isReadOnlyQuery rejects various forms of
SELECT ... INTO statements such as SELECT ... INTO new_table, SELECT ... INTO
TEMP table, and similar variations. Ensure each expectation calls
isReadOnlyQuery with these statements and expects false to be returned.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/verify/finding.ts`:
- Around line 30-35: The endpoint and message values passed to
trackVerifyFinding are raw user input that may contain sensitive data like
emails, tokens, or URL query parameters, creating a PII/secrets leakage risk in
analytics. Create two helper functions in this file (sanitizeEndpoint and
sanitizeMessage) that remove query strings from endpoints and redact email
addresses and bearer tokens from messages using regex patterns, then apply these
functions to opts.endpoint and opts.message before passing them to the
trackVerifyFinding call.

In `@src/commands/verify/rls.ts`:
- Around line 47-67: The helper functions login, getAnonKey, rawsqlRows, and
recordsCount make live API calls without timeout or abort handling, which can
cause indefinite blocking if network calls hang and stall CI. For each of these
functions, implement AbortController with a reasonable timeout (e.g., 30
seconds), pass the abort signal to the fetch or HTTP calls, and add proper error
handling to catch and handle AbortError and timeout scenarios by throwing or
returning meaningful errors instead of hanging indefinitely.

In `@src/commands/verify/truth.ts`:
- Around line 30-42: The rawsqlRows() database call is executed before
validating that the required expectation flags are provided, causing unnecessary
database calls for invalid input. Move the validation check that throws CLIError
(checking that either opts.expectCount or opts.expect is defined) to execute
before the rawsqlRows() call. This ensures the function fails fast with
validation errors before attempting to query the database, avoiding wasted
resources on invalid requests.

In `@src/lib/analytics.ts`:
- Around line 149-167: The trackVerifyFinding function currently sends
potentially sensitive data in the endpoint and message fields, which can contain
emails, tokens, or query parameters. Instead of the current approach of
filtering safeEvidence and then sending all other fields, implement a whitelist
approach where you only include known-safe fields in the captureEvent call. Keep
only the safe structured fields like finding_type, passed, table, kind, and
status, and for the evidence data, explicitly whitelist only the safe evidence
keys (such as row count metrics) rather than filtering out unsafe ones. Remove
or transform the raw endpoint and message fields before sending them to avoid
leaking PII.

In `@src/lib/browser-mcp.ts`:
- Around line 90-92: The execAsync call in the MCP presence probe (checking for
MCP_SERVER_NAME using the "claude mcp get" command) lacks a timeout, which could
cause the install/link flow to hang indefinitely if the command becomes
unresponsive. Add a timeout option to the execAsync call to ensure it fails
after a reasonable duration (such as a few seconds) rather than waiting
indefinitely. This will allow the promise chain to proceed to the catch block if
the command times out.

In `@src/lib/skills.ts`:
- Around line 175-177: The log message in the clack.log.info call is too
assertive and inaccurate. The condition configured.length === 0 can result from
either no supported agent being found OR agent setup failure (with errors being
swallowed). Replace the message "No supported agent found to auto-configure the
browser MCP" with a non-assertive message that acknowledges both possibilities,
such as something that indicates the browser MCP could not be auto-configured
rather than definitively stating no agent was found. Keep the helpful Claude
configuration instruction at the end of the message.

In `@src/lib/verify-probe.ts`:
- Around line 99-145: Add timeout protection to prevent indefinite hangs in the
four fetch functions: login, getAnonKey, rawsqlRows, and recordsCount. For each
function, create an AbortController and set a timeout that aborts the fetch
operation if it exceeds a reasonable duration (e.g., 30 seconds). Pass the abort
signal to the fetch request via the signal option in the fetch options object,
and wrap each fetch call in a try-catch block to handle the AbortError that
occurs when the timeout fires, converting it to a meaningful error message that
includes the operation name and timeout details.
- Around line 61-67: The isReadOnlyQuery function allows dangerous SELECT ...
INTO statements that create or populate tables in PostgreSQL despite passing the
read-only check. Add a regex pattern to the guard that detects and rejects
queries containing the INTO keyword used in conjunction with SELECT statements,
similar to how the function currently rejects INSERT, UPDATE, DELETE, and other
write operations. This check should be added alongside the existing keyword
rejection pattern on line 66 to prevent database mutation through CREATE TABLE
AS SELECT or SELECT ... INTO constructs.

---

Nitpick comments:
In `@src/lib/verify-probe.test.ts`:
- Around line 52-78: The isReadOnlyQuery test suite does not have coverage for
PostgreSQL's SELECT ... INTO statement form which creates or modifies tables and
should be rejected. Add a new test case within the describe('isReadOnlyQuery')
block, after the existing rejection tests, that uses it() to test that
isReadOnlyQuery rejects various forms of SELECT ... INTO statements such as
SELECT ... INTO new_table, SELECT ... INTO TEMP table, and similar variations.
Ensure each expectation calls isReadOnlyQuery with these statements and expects
false to be returned.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1679be85-08fd-4ee4-a79c-3c4c21824d71

📥 Commits

Reviewing files that changed from the base of the PR and between 35d8c13 and ee76be5.

📒 Files selected for processing (12)

src/commands/deployments/deploy.ts
src/commands/verify/finding.ts
src/commands/verify/index.ts
src/commands/verify/rls.ts
src/commands/verify/truth.ts
src/index.ts
src/lib/analytics.ts
src/lib/browser-mcp.test.ts
src/lib/browser-mcp.ts
src/lib/skills.ts
src/lib/verify-probe.test.ts
src/lib/verify-probe.ts

coderabbitai · 2026-06-18T20:19:27Z

+          endpoint: opts.endpoint as string | undefined,
+          message: opts.message as string | undefined,
+          table: opts.table as string | undefined,
+        };
+        trackVerifyFinding(finding, config);
+        await shutdownAnalytics(); // flush the PostHog event before exit


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitize --message and --endpoint before tracking analytics.

Line 30 through Line 35 forwards raw free-form text to trackVerifyFinding(...). These values can include emails, tokens, or URL query params from runtime errors, which risks telemetry PII/secrets leakage. Redact sensitive patterns and strip query strings before emitting.

Suggested direction

- endpoint: opts.endpoint as string | undefined, - message: opts.message as string | undefined, + endpoint: sanitizeEndpoint(opts.endpoint as string | undefined), + message: sanitizeMessage(opts.message as string | undefined),

// Add small helpers in this file: function sanitizeEndpoint(v?: string): string | undefined { if (!v) return undefined; return v.split('?')[0]; // drop query params } function sanitizeMessage(v?: string): string | undefined { if (!v) return undefined; return v .replace(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}/gi, '[redacted-email]') .replace(/\b(?:Bearer\s+)?[A-Za-z0-9._-]{20,}\b/g, '[redacted-token]') .slice(0, 500); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/verify/finding.ts` around lines 30 - 35, The endpoint and message values passed to trackVerifyFinding are raw user input that may contain sensitive data like emails, tokens, or URL query parameters, creating a PII/secrets leakage risk in analytics. Create two helper functions in this file (sanitizeEndpoint and sanitizeMessage) that remove query strings from endpoints and redact email addresses and bearer tokens from messages using regex patterns, then apply these functions to opts.endpoint and opts.message before passing them to the trackVerifyFinding call.

coderabbitai · 2026-06-18T20:19:27Z

+      const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`)
+        .then(() => true)
+        .catch(() => false);


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a timeout to the Claude MCP presence probe.

Line 90 runs claude mcp get without a timeout. If that command hangs, the install/link flow can hang indefinitely.

Suggested fix

- const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`) + const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`, { + timeout: MCP_CONFIG_TIMEOUT_MS, + }) .then(() => true) .catch(() => false);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`)

.then(() => true)

.catch(() => false);

const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`, {

timeout: MCP_CONFIG_TIMEOUT_MS,

})

.then(() => true)

.catch(() => false);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/browser-mcp.ts` around lines 90 - 92, The execAsync call in the MCP presence probe (checking for MCP_SERVER_NAME using the "claude mcp get" command) lacks a timeout, which could cause the install/link flow to hang indefinitely if the command becomes unresponsive. Add a timeout option to the execAsync call to ensure it fails after a reasonable duration (such as a few seconds) rather than waiting indefinitely. This will allow the promise chain to proceed to the catch block if the command times out.

coderabbitai · 2026-06-18T20:19:27Z

+          clack.log.info(
+            'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx @playwright/mcp@latest --headless`.',
+          );


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a non-assertive message when configured is empty.

Line 175 currently states “No supported agent found,” but configured.length === 0 can also mean agent setup failed (errors are swallowed in src/lib/browser-mcp.ts Line 135).

Suggested fix

- clack.log.info( - 'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.', - ); + clack.log.info( + 'Could not auto-configure the browser MCP (no supported agent detected or configuration failed). Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.', + );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

clack.log.info(

'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx @playwright/mcp@latest --headless`.',

);

clack.log.info(

'Could not auto-configure the browser MCP (no supported agent detected or configuration failed). Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.',

);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/skills.ts` around lines 175 - 177, The log message in the clack.log.info call is too assertive and inaccurate. The condition configured.length === 0 can result from either no supported agent being found OR agent setup failure (with errors being swallowed). Replace the message "No supported agent found to auto-configure the browser MCP" with a non-assertive message that acknowledges both possibilities, such as something that indicates the browser MCP could not be auto-configured rather than definitively stating no agent was found. Keep the helpful Claude configuration instruction at the end of the message.

…, share oss runRawSql/getAnonKey

jwfing

Code review — `feat/agent-e2e-verify` (PoC)

Summary: A well-structured, defensively-coded PoC that adds hidden verify rls/truth/finding probes (pure, unit-tested verdict logic + thin fetch wiring), PII-stripping analytics, browser-MCP auto-setup, and Playwright deploy-exclude patterns — no blocking issues, a few convention/scope items worth addressing before this graduates from PoC.

Requirements context

No spec/plan under docs/specs/ matches this PR — the documents there cover diagnose and db-migrations, not agent-E2E/verify. Assessed against the PR description (cubic/Macroscope summaries) and the repo conventions in DEVELOPMENT.md. No invented requirements.

Critical

(none) — no correctness, security, data-loss, or behavior-breaking issues found.

Suggestion

Software engineering / telemetry — event name breaks the cli_<feature>_<action> convention. src/lib/analytics.ts:171 emits 'verify_finding', but every other helper in this file uses the cli_*_invoked shape (cli_diagnose_invoked, cli_config_invoked, …) per DEVELOPMENT.md §2 ("Use the cli_<feature>_<action> convention. Keep event names stable."). Recommend cli_verify_finding (or similar) for dashboard consistency. It's a new event so nothing breaks today, but the name should land right the first time.

Functionality / scope — installSkills(withBrowserMcp = true) is "opt-in" in the comment only. src/lib/skills.ts:88 defaults the new param to true and none of the six call sites (create.ts:443, link.ts:127/246/308/492/534) ever pass false, so the comment at skills.ts:156 ("Opt-in") doesn't match reality — every insforge link/create now mutates global, outside-the-project agent config (claude mcp add -s user, ~/.cursor/mcp.json, ~/.codex/config.toml). This also runs under --json (automation/CI) where output is suppressed but the writes still happen. The merges are careful (idempotent, malformed-JSON-safe, preserve other servers — nicely tested), so this is UX/scope rather than data-loss, but I'd wire a real opt-out (CLI flag or prompt) or make it genuinely opt-in before merge.

Security / testing — the redaction path has no direct tests. sanitizeEndpoint/sanitizeMessage (src/lib/analytics.ts:152-165) are the guard that keeps emails/tokens/PII out of PostHog, yet they have no unit tests, while the pure verdict helpers (classifyRls/classifyTruth/isReadOnlyQuery/isSafeIdentifier/isLikelyEmail) are thoroughly covered. The redaction regexes are exactly the kind of code a test should pin (e.g. token boundary, multiple emails, the 500-char truncation). Add cases for these.

Security — isReadOnlyQuery is keyword-based and won't catch side-effecting function calls. src/lib/verify-probe.ts (isReadOnlyQuery) blocks DML/DDL/chaining/INTO/CTE-DML well, but a query like SELECT some_mutating_fn() passes the guard and runs through runRawSql with the admin key (oss.ts:22, "superadmin access"). Acceptable for a hidden experimental probe, but worth a code comment acknowledging the gap (or routing reads through a lower-privilege path) before this is exposed more widely.

Information

Convention — direct fetch in verify-probe.ts (login, recordsCount) deviates from DEVELOPMENT.md §1 ("Do not call fetch directly … add a typed wrapper to oss.ts"). Here it's justified — RLS probing needs per-user/anon tokens, and ossFetch hard-codes the admin api_key Bearer (oss.ts:113-125), so it genuinely can't be reused. Consider a small typed wrapper or a note so the deviation is intentional-on-the-record.
Telemetry shape — trackVerifyFinding (analytics.ts:171) omits the standard project_name/org_id/region/oss_mode properties the other track* helpers all include; adding them keeps verify_finding groupable like the rest.
Convention — flush in finally. verify/{rls,truth,finding}.ts call await shutdownAnalytics() inline after the capture rather than in a finally block (DEVELOPMENT.md §2). Functionally fine (it immediately follows the only capture), just not the documented pattern.
SQL interpolation in rls.ts — the auth.users email lookup (rls.ts, select id from auth.users where email='...') interpolates after isLikelyEmail validation + single-quote escaping; safe given runRawSql has no parameter binding. Defense-in-depth is present; noted only because raw interpolation always deserves a second look.
agent-skills sync (DEVELOPMENT.md §3) — the new hidden verify commands and the browser-MCP behavior should be reflected in the insforge-verify skill in InsForge/agent-skills; the PR doesn't mention a companion update.
--expect-count is compared as a trimmed string (truth.ts → classifyTruth(rows.length, String(opts.expectCount))), so --expect-count 03 mismatches 3 rows. Minor.

Strengths worth calling out

The empty-token guard (rls.ts) and assertOk (verify-probe.ts) both close genuinely subtle false-pass traps — an empty token or a non-2xx being silently read as "0 rows / isolation holds." That's the right instinct for a verification tool.
Strong, well-tested input validation (isSafeIdentifier/isLikelyEmail/isReadOnlyQuery) and config-merge robustness (malformed/array/primitive JSON, idempotency).
Deploy exclude patterns (deploy.ts:53) are correct and scoped.

Verdict

approved (informational — no Critical findings; posted as a COMMENT, a human still gives the GitHub green-check via the approve flow). The Suggestions, especially the default-on global-config mutation and the untested redaction path, are worth resolving before this PoC graduates to a shipped feature.

…est redaction, numeric expect-count

jwfing

Summary
This PR adds useful verify/MCP plumbing, but the current implementation has one security blocker and one core correctness blocker in the new verify commands.

Requirements context
I could not find repo docs that define verify beyond this PR, so I assessed intent primarily from the PR description and the inline command descriptions in src/commands/verify/*.ts. For project conventions, I also used DEVELOPMENT.md, especially the command/error/output and analytics guidance. I was not able to execute the Vitest suite in this workspace because vitest is not installed (sh: vitest: not found).

Findings

Critical

src/lib/verify-probe.ts:55-69, src/lib/api/oss.ts:13-29: verify truth is presented as a single read-only query, but the guard is only a regex over the SQL text before passing it to runRawSql, which executes with the project API key's superadmin privileges. That still allows side-effecting statements hidden behind SELECT/WITH via mutating SQL functions, so the command does not actually provide the safety guarantee it advertises.
src/commands/verify/rls.ts:18-20, src/commands/verify/rls.ts:60-65: the anonymous control is checking the entire table (recordsCount(..., undefined, undefined, anon)) instead of checking the same owner-scoped filter used for user A. Any table that intentionally exposes some public rows will be reported as rls_leak even when B and anonymous users cannot read A's rows, which breaks the command's stated behavior of verifying "B cannot read A, A can read own".

Suggestion

DEVELOPMENT.md:48-53, src/lib/analytics.ts:154-180, src/commands/verify/finding.ts:18-34: this change starts sending sanitized-but-still-free-form message and endpoint text to PostHog. That conflicts with the repo's telemetry guidance to never send user-entered free text, and the current redaction only covers query strings, emails, and long token-like substrings. I would strongly prefer reducing this event to structured fields (finding_type, kind, status, table, booleans/counts) and dropping the free-text properties entirely.
src/commands/verify/finding.ts:11-47, src/commands/verify/rls.ts:15-84, src/commands/verify/truth.ts:9-68, src/lib/skills.ts:160-188: the added tests cover the pure helpers well, but there is still no command-level coverage for the new hidden CLI surface or for the installSkills browser-MCP side effects. That leaves argument parsing, exit-code behavior, and telemetry/flush behavior unverified.

Information

(none)

Verdict
request_changes — the PR needs fixes for the read-only SQL guarantee and the anonymous RLS probe scope before it is safe to merge.

…nest read-only guard wording, command tests

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/verify/rls.test.ts`:
- Around line 51-56: The test assertions at lines 51-56 and 59-64 are too loose
- they only verify that an exit occurs with `/exit:/` but don't confirm the
rejection is specifically due to the `--owner` or `--user-a` validation guards.
Replace the generic `/exit:/` pattern matching in the toThrow() calls with more
specific validation error messages that confirm the actual validation failure
for each guard (the `--owner` parameter validation in the first test and
`--user-a` parameter validation in the second test) to ensure these tests are
hitting the intended rejection paths and not passing on unrelated early-exit
scenarios.
- Around line 70-75: The test currently validates only the 3rd recordsCount call
with a partial filter match using stringContaining, but the comment indicates
all three probes must use the same owner-scoped filter. Capture the filter
argument from the 1st call to recordsCount, then add assertions to verify that
both the 2nd and 3rd calls use the exact same filter string with strict equality
(not partial matching). This ensures consistency across all three probes as
intended.

In `@src/commands/verify/truth.test.ts`:
- Around line 75-80: The test case `flags false_pass (exit 1) when DB differs
from the claim` currently only validates the exit code but should also verify
analytics behavior on the failure path. Add expect assertions after the
parseAsync call to verify that trackVerifyFinding was called with the
appropriate finding details and that shutdownAnalytics was called to ensure
findings are properly emitted and analytics are properly flushed when
verification fails.
- Around line 59-63: The test for rejecting a non-integer --expect-count only
asserts that an error is thrown but does not verify that the database call to
runRawSql is prevented. Add a spy or mock for the runRawSql function before
calling makeProgram().parseAsync() and include an assertion that verifies
runRawSql was never called. This ensures the command validates the
--expect-count argument before attempting any database operations, preventing
unnecessary live-call leakage when invalid input is provided.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 438e9671-8b94-49dc-99c7-2e7080d1d9eb

📥 Commits

Reviewing files that changed from the base of the PR and between ee76be5 and 63b3147.

📒 Files selected for processing (7)

src/commands/verify/rls.test.ts
src/commands/verify/rls.ts
src/commands/verify/truth.test.ts
src/commands/verify/truth.ts
src/lib/analytics.ts
src/lib/verify-probe.test.ts
src/lib/verify-probe.ts

🚧 Files skipped from review as they are similar to previous changes (3)

src/commands/verify/rls.ts
src/commands/verify/truth.ts
src/lib/verify-probe.test.ts

coderabbitai · 2026-06-19T06:10:42Z

+  it('rejects an --owner that smuggles PostgREST params, before any login', async () => {
+    const { login } = await import('../../lib/verify-probe.js');
+    await expect(
+      makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id&select=secret', '--json'], { from: 'user' }),
+    ).rejects.toThrow(/exit:/);
+    expect(login).not.toHaveBeenCalled();


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten assertions to verify the intended rejection cause.

On Lines 53-56 and Lines 61-64, asserting only /exit:/ can pass on unrelated early-exit paths. Add an assertion on the emitted validation message so these tests prove they’re hitting the --owner and --user-a guards specifically.

Suggested test hardening

describe('verify rls (command)', () => { let exitSpy: ReturnType<typeof vi.spyOn>; + let errorSpy: ReturnType<typeof vi.spyOn>; beforeEach(() => { vi.clearAllMocks(); process.exitCode = undefined; exitSpy = vi.spyOn(process, 'exit').mockImplementation(((code?: number) => { throw new Error(`exit:${code}`); }) as never); - vi.spyOn(console, 'error').mockImplementation(() => {}); + errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {}); vi.spyOn(console, 'log').mockImplementation(() => {}); }); @@ it('rejects an --owner that smuggles PostgREST params, before any login', async () => { const { login } = await import('../../lib/verify-probe.js'); await expect( makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id&select=secret', '--json'], { from: 'user' }), ).rejects.toThrow(/exit:/); + expect(errorSpy.mock.calls.flat().join(' ')).toMatch(/--owner must be a bare column name/); expect(login).not.toHaveBeenCalled(); }); @@ it('rejects a non-email --user-a, before any login', async () => { const { login } = await import('../../lib/verify-probe.js'); await expect( makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--user-a', 'not-an-email', '--json'], { from: 'user' }), ).rejects.toThrow(/exit:/); + expect(errorSpy.mock.calls.flat().join(' ')).toMatch(/--user-a and --user-b must be valid email addresses/); expect(login).not.toHaveBeenCalled(); });

Also applies to: 59-64

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/verify/rls.test.ts` around lines 51 - 56, The test assertions at lines 51-56 and 59-64 are too loose - they only verify that an exit occurs with `/exit:/` but don't confirm the rejection is specifically due to the `--owner` or `--user-a` validation guards. Replace the generic `/exit:/` pattern matching in the toThrow() calls with more specific validation error messages that confirm the actual validation failure for each guard (the `--owner` parameter validation in the first test and `--user-a` parameter validation in the second test) to ensure these tests are hitting the intended rejection paths and not passing on unrelated early-exit scenarios.

coderabbitai · 2026-06-19T06:10:42Z

+    // 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.
+    expect(recordsCount).toHaveBeenCalledTimes(3);
+    // The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.
+    expect(recordsCount).toHaveBeenNthCalledWith(
+      3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',
+    );


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate filter equality across all three recordsCount probes.

Line 70 says all probes must use the same owner-scoped filter, but the test currently checks only the 3rd call and only with stringContaining. Compare the 3rd argument from all three calls for exact equality.

Suggested assertion improvement

it('scopes the anonymous control to A\'s owner filter (not the whole table)', async () => { const { recordsCount } = await import('../../lib/verify-probe.js'); await makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--json'], { from: 'user' }); // 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter. expect(recordsCount).toHaveBeenCalledTimes(3); + const calls = (recordsCount as ReturnType<typeof vi.fn>).mock.calls; + const [filter1, filter2, filter3] = [calls[0][2], calls[1][2], calls[2][2]]; + expect(typeof filter1).toBe('string'); + expect(filter1).toMatch(/^user_id=eq\./); + expect(filter2).toBe(filter1); + expect(filter3).toBe(filter1); // The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter. expect(recordsCount).toHaveBeenNthCalledWith( 3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon', ); });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.

expect(recordsCount).toHaveBeenCalledTimes(3);

// The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.

expect(recordsCount).toHaveBeenNthCalledWith(

3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',

);

it('scopes the anonymous control to A\'s owner filter (not the whole table)', async () => {

const { recordsCount } = await import('../../lib/verify-probe.js');

await makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--json'], { from: 'user' });

// 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.

expect(recordsCount).toHaveBeenCalledTimes(3);

const calls = (recordsCount as ReturnType<typeof vi.fn>).mock.calls;

const [filter1, filter2, filter3] = [calls[0][2], calls[1][2], calls[2][2]];

expect(typeof filter1).toBe('string');

expect(filter1).toMatch(/^user_id=eq\./);

expect(filter2).toBe(filter1);

expect(filter3).toBe(filter1);

// The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.

expect(recordsCount).toHaveBeenNthCalledWith(

3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',

);

});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/verify/rls.test.ts` around lines 70 - 75, The test currently validates only the 3rd recordsCount call with a partial filter match using stringContaining, but the comment indicates all three probes must use the same owner-scoped filter. Capture the filter argument from the 1st call to recordsCount, then add assertions to verify that both the 2nd and 3rd calls use the exact same filter string with strict equality (not partial matching). This ensures consistency across all three probes as intended.

coderabbitai · 2026-06-19T06:10:42Z

+  it('flags false_pass (exit 1) when DB differs from the claim', async () => {
+    const oss = await import('../../lib/api/oss.js');
+    (oss.runRawSql as ReturnType<typeof vi.fn>).mockResolvedValue({ rows: [{ n: 1 }] });
+    await makeProgram().parseAsync(['verify', 'truth', '--query', 'select n', '--expect', '3', '--json'], { from: 'user' });
+    expect(process.exitCode).toBe(1);
+  });


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify analytics behavior on the false_pass path, not just exit code.

Line 75 covers process.exitCode, but the PR contract also says findings are emitted and flushed on failures. Add trackVerifyFinding and shutdownAnalytics assertions here to lock that behavior.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/verify/truth.test.ts` around lines 75 - 80, The test case `flags false_pass (exit 1) when DB differs from the claim` currently only validates the exit code but should also verify analytics behavior on the failure path. Add expect assertions after the parseAsync call to verify that trackVerifyFinding was called with the appropriate finding details and that shutdownAnalytics was called to ensure findings are properly emitted and analytics are properly flushed when verification fails.

cubic-dev-ai

1 issue found across 7 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

… no DB call on invalid input

jwfing

Summary
This PR mostly implements the stated hidden verification probes and browser-MCP setup as described, with good input validation and telemetry scrubbing, and I found no blocking issues.

Requirements context
I derived intent primarily from the PR description because I did not find repo documentation for the new hidden verify surface itself; for repo conventions and telemetry expectations, I used DEVELOPMENT.md, especially the command-layout and PostHog guidance.

Findings
Critical
(none)

Suggestion

src/lib/browser-mcp.ts:104-116 — The Cursor/Codex auto-setup path is gated on ~/.cursor / ~/.codex already existing, even though mergeJsonMcp() and ensureCodexToml() already create parent directories themselves. In practice this means a fresh agent install can hit the “No supported agent found” path and never get the promised auto-configuration until the user has manually launched that agent once.
src/lib/browser-mcp.ts:17-18, src/lib/skills.ts:175-184 — The generated MCP config hardcodes npx @playwright/mcp@latest --headless. Using @latest makes every future agent session depend on whatever was most recently published upstream, which is both a supply-chain risk and a reproducibility risk for the verification flow this PR is trying to stabilize.
src/lib/browser-mcp.test.ts:1-109, src/lib/skills.ts:160-188 — The new tests cover the JSON/TOML helper functions, but not the actual configureBrowserMcp() / installSkills() integration path. That leaves the agent-detection logic and user-facing fallback behavior unverified, which is exactly where the .cursor / .codex existence-gate bug above lives.

Information

src/commands/verify/rls.ts:31-66, src/commands/verify/truth.ts:20-58, src/lib/analytics.ts:143-172 — The new verify commands do a solid job on the main security-sensitive areas: identifier/email validation before SQL or PostgREST usage, non-2xx handling for probe requests, and dropping free-text / sensitive evidence fields before PostHog capture.
src/commands/deployments/deploy.ts:27-52 — The deployment exclusion additions are low-risk and should reduce unnecessary upload size; I did not see a separate performance regression in the changed paths.

Verdict
approved — no Critical findings. I could not run the new test suites in this checkout because the workspace does not have dev dependencies installed (vitest: not found), so the review is based on static inspection.

CarmenDou added 13 commits June 12, 2026 15:23

feat(preview): experimental preview commands for isolated full-stack …

5108b22

…e2e environments

fix(preview): address review — literal env values, early manifest, sa…

16f06f1

…fe names, resilient teardown

fix(preview): cleanup branch on poll failure, overwrite all duplicate…

f64f000

… keys, assert hidden via help

fix(preview): validate name before provisioning, guard duplicate crea…

aa43a5b

…tes, tolerate 404 on teardown, drop name from analytics

fix(preview): gate teardown output behind --json, wrap teardown name …

3e1323f

…validation, type create opts

feat(link): --with-test-agents installs Playwright Test Agents for in…

21c8689

…sforge-verify at link time

Merge remote-tracking branch 'origin/main' into worktree-agent-e2e-pr…

48d0d38

…eview-poc # Conflicts: # src/commands/projects/link.ts

docs(preview): add README covering verify-loop context, flow, design …

3dd19ac

…decisions, gotchas

fix(deployments): exclude Playwright test artifacts (test-results, pl…

0c743ab

…aywright-report, .playwright-mcp) from deploy uploads

feat(verify): light-mode verify probes, default browser MCP, drop pre…

517a2a9

…view commands

chore(verify): remove dead overwriteEnvFile, simplify skills-only ins…

26d56bf

…tallSkills call

fix(verify): guard read-only SQL, assert fetch ok, harden MCP merge, …

29acf6c

…fix finding props

fix(verify): block destructive DML hidden inside a CTE in isReadOnlyQ…

1452c86

…uery

cubic-dev-ai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/lib/verify-probe.test.ts

Comment thread src/commands/verify/truth.ts

Comment thread src/lib/browser-mcp.ts Outdated

Comment thread src/lib/browser-mcp.ts Outdated

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/commands/verify/rls.ts Outdated

Comment thread src/commands/verify/rls.ts Outdated

Comment thread src/lib/analytics.ts Outdated

Comment thread src/lib/browser-mcp.ts

fix(verify): validate identifiers/emails, block MERGE CTE, exclusive …

ee76be5

…expect flags, Windows command lookup, drop PII from telemetry

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

fix(verify): block SELECT INTO, sanitize finding endpoint/message PII…

008fdd5

…, share oss runRawSql/getAnonKey

jwfing reviewed Jun 18, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/lib/verify-probe.ts

fix(verify): rename event to cli_verify_finding with standard dims, t…

a309bd5

…est redaction, numeric expect-count

This comment was marked as duplicate.

Sign in to view

jwfing requested changes Jun 18, 2026

View reviewed changes

fix(verify): owner-scope anon RLS probe, drop free-text telemetry, ho…

63b3147

…nest read-only guard wording, command tests

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/commands/verify/truth.ts Outdated

cubic-dev-ai Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/commands/verify/truth.test.ts Outdated

fix(verify): validate --expect-count before running the query, assert…

0bdb097

… no DB call on invalid input

jwfing reviewed Jun 19, 2026

View reviewed changes

Conversation

CarmenDou commented Jun 18, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Add hidden verify CLI commands for RLS isolation and backend truth probes

Changes since #170 opened

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jwfing left a comment

Choose a reason for hiding this comment

Code review — feat/agent-e2e-verify (PoC)

Requirements context

Critical

Suggestion

Information

Strengths worth calling out

Verdict

Uh oh!

Uh oh!

This comment was marked as duplicate.

Uh oh!

jwfing left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

CarmenDou commented Jun 18, 2026 •

edited by macroscopeapp Bot

Loading

Add hidden `verify` CLI commands for RLS isolation and backend truth probes

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

greptile-apps Bot commented Jun 18, 2026 •

edited

Loading

Code review — `feat/agent-e2e-verify` (PoC)

cubic-dev-ai Bot left a comment •

edited

Loading