Skip to content

PoC: feat/agent-e2e-verify#170

Open
CarmenDou wants to merge 18 commits into
mainfrom
worktree-agent-e2e-preview-poc
Open

PoC: feat/agent-e2e-verify#170
CarmenDou wants to merge 18 commits into
mainfrom
worktree-agent-e2e-preview-poc

Conversation

@CarmenDou

@CarmenDou CarmenDou commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary by cubic

Adds hidden verify probes and auto-setup for the @playwright/mcp browser server to power light‑mode agent E2E verification with analytics-backed findings. Also excludes Playwright artifacts from deployment uploads.

  • New Features

    • Hidden verify CLI with rls, truth, and finding; each emits cli_verify_finding and sets pass/fail exit codes.
    • Auto-configures the @playwright/mcp server (npx @playwright/mcp@latest --headless) for Claude Code, Cursor, and Codex.
    • Deterministic probe helpers with unit tests, plus command tests for rls/truth.
  • Bug Fixes

    • Exclude Playwright artifacts from deploy uploads: test-results, playwright-report, .playwright-mcp.
    • Harden verify truth: block destructive DML/DDL (incl. CTE/MERGE and SELECT INTO), prevent statement chaining, treat non‑2xx as errors; enforce exclusive --expect vs --expect-count; validate --expect-count before executing to avoid DB calls on invalid input.
    • Improve safety and accuracy: drop all free‑text from telemetry (no endpoint/message; also strip db_actual/ui_claimed), scope the anonymous RLS control to the owner filter, validate identifiers/emails, and improve Windows agent detection for MCP setup.

Written for commit 0bdb097. Summary will update on new commits.

Review in cubic

Note

Add hidden verify CLI commands for RLS isolation and backend truth probes

  • Adds a hidden verify command group with three subcommands: verify rls, verify truth, and verify finding, registered in src/index.ts.
  • verify rls probes row-level security by performing cross-user and anonymous PostgREST reads, classifying results as rls_leak, rls_overrestrict, or pass, and setting a non-zero exit code on failure.
  • verify truth runs a read-only SQL query and compares the result to a UI-claimed value or row count, recording a false_pass finding on mismatch.
  • verify finding records an arbitrary loud error (4xx/5xx, console error) as an analytics finding.
  • All findings are emitted via trackVerifyFinding in src/lib/analytics.ts, which sanitizes endpoints and messages and strips sensitive evidence fields before sending.
  • During installSkills, the Playwright browser MCP is now configured for supported agents (Claude Code, Cursor, Codex) and users see a restart notice.
  • Behavioral Change: installSkills now accepts an optional third parameter withBrowserMcp (default true) and may write to agent config files on disk during skill installation.

Changes since #170 opened

  • Enhanced telemetry tracking in trackVerifyFinding to include project and organization context [a309bd5]
  • Added validation for --expect-count CLI flag to enforce non-negative integer values [a309bd5]
  • Exported analytics sanitization utilities and added comprehensive unit tests [a309bd5]
  • Fixed anonymous control probe in verify.registerVerifyRlsCommand to use the same owner-scoped filter string when calling recordsCount instead of passing undefined [63b3147]
  • Removed endpoint and message properties from analytics.trackVerifyFinding event payload and deleted sanitizeEndpoint and sanitizeMessage functions [63b3147]
  • Added Vitest test suites for verify rls and verify truth commands with comprehensive test coverage [63b3147]
  • Updated error message in verify.registerVerifyTruthCommand to clarify that the guard blocks common destructive query forms but does not provide hard read-only guarantees [63b3147]
  • Expanded documentation comments for verify-probe.isReadOnlyQuery helper to clarify limitations of the read-only guard [63b3147]
  • Added pre-I/O validation to the registerVerifyTruthCommand action handler requiring either --expect or --expect-count flags and validating that --expect-count is a non-negative integer before executing database queries [0bdb097]
  • Updated the test for rejecting non-integer --expect-count values to assert that runRawSql is not invoked when validation fails [0bdb097]

Macroscope summarized 008fdd5.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added experimental (hidden) “verify” probe commands for RLS isolation and backend truth checks, plus recording of verification findings with analytics.
    • Added deterministic browser MCP configuration for supported assistants, enabled during skill installation by default.
  • Bug Fixes
    • Improved deployment packaging to exclude additional test/report directories from uploads.
  • Tests
    • Added coverage for browser MCP configuration and verify-probe/classification logic and CLI behaviors.
  • Chores
    • Expanded deployment exclusion patterns to reduce unnecessary artifacts in uploaded bundles.

CarmenDou added 13 commits June 12, 2026 15:23
…tes, tolerate 404 on teardown, drop name from analytics
…eview-poc

# Conflicts:
#	src/commands/projects/link.ts
…aywright-report, .playwright-mcp) from deploy uploads
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

Adds an experimental verify CLI command group with three subcommands (rls, truth, finding) backed by new probe classification logic, fetch helpers, and PostHog analytics tracking. Separately adds browser MCP auto-configuration for Claude Code, Cursor, and Codex agents integrated into installSkills, with Playwright-generated directories excluded from deployments.

Changes

Verify Probe Commands

Layer / File(s) Summary
Probe classification logic and input validators
src/lib/verify-probe.ts
Defines RlsFindingType and TruthFindingType unions, implements classifyRls (leak/overrestrict/none based on row counts with evidence) and classifyTruth (normalized scalar comparison), and adds isReadOnlyQuery, isSafeIdentifier, and isLikelyEmail heuristics.
Fetch helpers and API calls
src/lib/verify-probe.ts
Implements token/row extraction, assertOk for non-2xx errors, login, and recordsCount (treating 401/403 as zero).
VerifyFinding analytics interface and tracker
src/lib/analytics.ts
Exports VerifyFinding interface and trackVerifyFinding, which filters sensitive db_actual/ui_claimed evidence keys before emitting a cli_verify_finding PostHog event.
Probe unit tests
src/lib/verify-probe.test.ts
Vitest suite for classifyRls priority/verdict, classifyTruth normalization and null/undefined, isReadOnlyQuery CTE/chaining rejection, isSafeIdentifier, and isLikelyEmail SQL-injection rejection.
RLS probe command
src/commands/verify/rls.ts
Registers verify rls: validates table/owner/user options, authenticates as user A and B, fetches user A's auth.users.id via raw SQL, runs three recordsCount probes (B→A, A→self, anon), classifies result, reports finding, and sets process.exitCode.
RLS command tests
src/commands/verify/rls.test.ts
Vitest tests validating input rejection (injection, non-email), probe scoping assertion for anonymous record counts with owner filter, and command wiring via mocked config/API/analytics.
Truth probe command
src/commands/verify/truth.ts
Registers verify truth: enforces read-only SQL, executes via runRawSql, classifies scalar or count expectations, tracks analytics, and sets process.exitCode based on match/mismatch.
Truth command tests
src/commands/verify/truth.test.ts
Vitest tests validating SQL query rejection, flag combination rejection, non-integer count rejection, match/mismatch exit codes, and analytics tracking on success.
Finding probe command
src/commands/verify/finding.ts
Registers verify finding: accepts kind/type/status/endpoint/message options, builds finding payload, tracks via analytics, flushes, and outputs JSON or formatted confirmation.
CLI wiring
src/commands/verify/index.ts, src/index.ts
Exports registerVerifyCommands, attaches rls/truth/finding to a hidden experimental verify commander subcommand, and registers it on the main CLI program.

Browser MCP Auto-configuration

Layer / File(s) Summary
browser-mcp helpers and per-agent logic
src/lib/browser-mcp.ts
Implements mergeJsonMcp (idempotent JSON merge with malformed-JSON fallback), ensureCodexToml (TOML block append), commandExists (platform-aware PATH check), and per-agent apply targets for Claude Code (via claude CLI), Cursor (~/.cursor/mcp.json), and Codex (~/.codex/config.toml). configureBrowserMcp runs all agents best-effort and returns configured labels.
browser-mcp tests
src/lib/browser-mcp.test.ts
Tests mergeJsonMcp for file creation, merge, idempotency, malformed JSON recovery, servers key, and non-object normalization; tests ensureCodexToml for block appending, idempotency, and content preservation.
installSkills integration and deploy exclusions
src/lib/skills.ts, src/commands/deployments/deploy.ts
Adds optional withBrowserMcp parameter to installSkills that conditionally invokes configureBrowserMcp with log feedback; extends EXCLUDE_PATTERNS to exclude test-results, playwright-report, and .playwright-mcp from deployments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • InsForge/CLI#119: Both PRs modify src/lib/skills.ts by changing the installSkills function signature/parameters (this PR adds withBrowserMcp, PR#119 adds authProvider) and thus affect the same skill-installation code path.

Suggested reviewers

  • tonychang04
  • jwfing

Poem

🐇 Hopping through the code at dawn,
New verify commands are drawn!
RLS probes sniff for leaks,
Truth checks what the database speaks.
Browser MCP auto-wires with flair —
This bunny finds no bugs in there! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'PoC: feat/agent-e2e-verify' is vague and uses non-descriptive technical jargon; it does not clearly convey what the changeset accomplishes to someone scanning commit history. Use a clear, descriptive title like 'Add verify CLI command group for agent E2E testing with RLS and truth probes' or 'Introduce verify subcommands for backend verification and MCP browser tool setup'.
Docstring Coverage ⚠️ Warning Docstring coverage is 27.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-agent-e2e-preview-poc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 12 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread src/lib/verify-probe.test.ts
Comment thread src/commands/verify/truth.ts
Comment thread src/lib/browser-mcp.ts Outdated
Comment thread src/lib/browser-mcp.ts Outdated
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Greptile Summary

This PR introduces a set of hidden experimental verify CLI commands (rls, truth, finding) for agent-driven E2E backend probing, auto-configures the Playwright MCP during skill installation, and excludes Playwright artifacts from deployment uploads. It also addresses several security concerns raised in earlier review rounds (identifier sanitisation, anon-scoped filter, PII evidence stripping, Windows command -v fix, pre-I/O flag validation).

  • verify rls performs a three-probe RLS isolation check (B reads A's rows, A reads own rows, anonymous reads A's rows), all scoped to the same owner filter; identifiers are validated with isSafeIdentifier and emails with isLikelyEmail; row counts (not row data) reach PostHog.
  • verify truth runs an admin-key SQL read-only check; the isReadOnlyQuery guard blocks common DML/DDL statement forms (documented caveat: function-call mutations like SELECT setval(...) still pass the keyword scan); all flag validation runs before runRawSql.
  • verify finding records agent-observed loud errors (4xx/5xx, console errors) to PostHog; endpoint and message are correctly dropped from the telemetry payload, but --kind is a required free-text field with no format enforcement that does reach PostHog.

Confidence Score: 4/5

Safe to merge as an experimental hidden feature; the most sensitive paths (PII in telemetry, identifier injection, pre-I/O flag validation) have been addressed, with one residual gap in the finding recorder's free-text kind field.

The verify commands are hidden and experimental, and this round has closed several earlier gaps. The remaining open items are the documented isReadOnlyQuery limitation (function-call mutations bypass the keyword guard, acknowledged in code comments) and the unvalidated --kind field in verify finding that reaches PostHog as free text. Neither blocks the PoC intent, but the telemetry gap is inconsistent with the stated no-free-text-to-PostHog contract established elsewhere in the same function.

src/commands/verify/finding.ts (--kind free-text reaches PostHog); src/commands/verify/rls.ts (email still string-interpolated into admin-key SQL, mitigated by isLikelyEmail but not fully parameterised)

Important Files Changed

Filename Overview
src/commands/verify/rls.ts New RLS isolation probe; validates identifiers and emails before use, but email is still string-interpolated into a raw admin-key SQL query rather than passed as a parameterized value.
src/commands/verify/truth.ts New backend-truth cross-check; all input validation (isReadOnlyQuery, mutual-exclusion, non-negative integer) now runs before runRawSql; acknowledged limitation that function-call mutations bypass the keyword guard.
src/commands/verify/finding.ts New finding recorder; endpoint/message are correctly excluded from PostHog payload; kind is a free-text required flag that reaches PostHog without format validation.
src/lib/analytics.ts Adds trackVerifyFinding; db_actual/ui_claimed are correctly stripped from PostHog; endpoint/message are excluded at the captureEvent call site; kind and table are transmitted as structured identifiers.
src/lib/verify-probe.ts Pure classification helpers are well-tested; isReadOnlyQuery explicitly documents that SELECT setval(...)-style function mutations bypass the keyword guard; fetch wiring correctly treats 401/403 as 0 rows and throws on other non-2xx.
src/lib/browser-mcp.ts Configures Playwright MCP for Claude Code, Cursor, and Codex; Windows now uses where instead of command -v; all shell arguments are hardcoded constants so no injection risk; best-effort per-agent error swallowing is appropriate.
src/lib/skills.ts Adds optional withBrowserMcp parameter (default true) and calls configureBrowserMcp after the main skills installation block; errors are caught and surfaced as warnings in non-JSON mode.
src/commands/verify/index.ts Thin orchestration file that registers verify subcommands under a hidden parent command; straightforward and correct.
src/index.ts Registers the new hidden verify command group alongside existing commands; no issues.
src/commands/deployments/deploy.ts Adds test-results, playwright-report, and .playwright-mcp to the deployment exclusion list; straightforward and correct.
src/lib/verify-probe.test.ts Comprehensive unit tests for all pure helpers; covers rls_leak, rls_overrestrict, false_pass, normalisation, isReadOnlyQuery edge cases, and identifier/email validation.
src/lib/browser-mcp.test.ts Tests mergeJsonMcp and ensureCodexToml with idempotency, malformed JSON, non-object section values, and existing content preservation.
src/commands/verify/rls.test.ts Tests identifier injection rejection, non-email rejection, and verifies the anonymous control probe uses an owner-scoped filter.
src/commands/verify/truth.test.ts Tests non-read query rejection, mutual-exclusion of flags, non-integer expect-count rejection (all before DB), and pass/fail exit codes.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent
    participant CLI as insforge CLI
    participant VProbe as verify-probe.ts
    participant OSS as PostgREST/Auth API
    participant PostHog

    Note over Agent,PostHog: verify rls
    Agent->>CLI: verify rls --table orders --owner user_id
    CLI->>CLI: validate identifiers and emails
    CLI->>OSS: login(userA) and login(userB)
    OSS-->>CLI: aToken, bToken
    CLI->>OSS: "runRawSql SELECT id FROM auth.users WHERE email=..."
    OSS-->>CLI: aId UUID
    CLI->>VProbe: recordsCount with bToken, aToken, undefined
    VProbe->>OSS: GET records with owner filter x3
    OSS-->>VProbe: row counts
    CLI->>VProbe: classifyRls
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding row counts only
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify truth
    Agent->>CLI: verify truth --query SELECT n --expect 3
    CLI->>CLI: isReadOnlyQuery and flag validation
    CLI->>OSS: runRawSql with admin key
    OSS-->>CLI: rows
    CLI->>VProbe: classifyTruth
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding without db_actual or ui_claimed
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify finding
    Agent->>CLI: verify finding --kind http_500 --status 500
    CLI->>PostHog: trackVerifyFinding kind and status only
    CLI-->>Agent: exit 0
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent
    participant CLI as insforge CLI
    participant VProbe as verify-probe.ts
    participant OSS as PostgREST/Auth API
    participant PostHog

    Note over Agent,PostHog: verify rls
    Agent->>CLI: verify rls --table orders --owner user_id
    CLI->>CLI: validate identifiers and emails
    CLI->>OSS: login(userA) and login(userB)
    OSS-->>CLI: aToken, bToken
    CLI->>OSS: "runRawSql SELECT id FROM auth.users WHERE email=..."
    OSS-->>CLI: aId UUID
    CLI->>VProbe: recordsCount with bToken, aToken, undefined
    VProbe->>OSS: GET records with owner filter x3
    OSS-->>VProbe: row counts
    CLI->>VProbe: classifyRls
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding row counts only
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify truth
    Agent->>CLI: verify truth --query SELECT n --expect 3
    CLI->>CLI: isReadOnlyQuery and flag validation
    CLI->>OSS: runRawSql with admin key
    OSS-->>CLI: rows
    CLI->>VProbe: classifyTruth
    VProbe-->>CLI: type and evidence
    CLI->>PostHog: trackVerifyFinding without db_actual or ui_claimed
    CLI-->>Agent: exit 0 or 1

    Note over Agent,PostHog: verify finding
    Agent->>CLI: verify finding --kind http_500 --status 500
    CLI->>PostHog: trackVerifyFinding kind and status only
    CLI-->>Agent: exit 0
Loading

Reviews (6): Last reviewed commit: "fix(verify): validate --expect-count bef..." | Re-trigger Greptile

Comment thread src/commands/verify/rls.ts Outdated
Comment thread src/commands/verify/rls.ts Outdated
Comment thread src/lib/analytics.ts Outdated
Comment thread src/lib/browser-mcp.ts
…expect flags, Windows command lookup, drop PII from telemetry

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
src/lib/verify-probe.test.ts (1)

52-78: ⚡ Quick win

Add a regression for SELECT ... INTO mutation.

The SQL safety tests cover DML-in-CTE and statement chaining, but not PostgreSQL’s table-creating SELECT ... INTO form. Add coverage alongside the read-only rejection cases so the admin-key guard cannot regress.

🧪 Suggested test additions
   it('rejects DML hidden inside a CTE (WITH … DELETE/UPDATE/INSERT/MERGE … SELECT)', () => {
     expect(isReadOnlyQuery('with x as (delete from users returning id) select id from x')).toBe(false);
     expect(isReadOnlyQuery('WITH x AS (UPDATE t SET c = 1 RETURNING id) SELECT * FROM x')).toBe(false);
     expect(isReadOnlyQuery('with x as (insert into t values (1) returning id) select id from x')).toBe(false);
     expect(isReadOnlyQuery('with x as (merge into t using s on t.id = s.id returning t.id) select id from x')).toBe(false);
   });
+
+  it('rejects SELECT INTO because it creates a table', () => {
+    expect(isReadOnlyQuery('select * into temp verify_probe_copy from users')).toBe(false);
+    expect(isReadOnlyQuery('with x as (select 1) select * into temp verify_probe_copy from x')).toBe(false);
+  });
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/verify-probe.test.ts` around lines 52 - 78, The isReadOnlyQuery test
suite does not have coverage for PostgreSQL's SELECT ... INTO statement form
which creates or modifies tables and should be rejected. Add a new test case
within the describe('isReadOnlyQuery') block, after the existing rejection
tests, that uses it() to test that isReadOnlyQuery rejects various forms of
SELECT ... INTO statements such as SELECT ... INTO new_table, SELECT ... INTO
TEMP table, and similar variations. Ensure each expectation calls
isReadOnlyQuery with these statements and expects false to be returned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/verify/finding.ts`:
- Around line 30-35: The endpoint and message values passed to
trackVerifyFinding are raw user input that may contain sensitive data like
emails, tokens, or URL query parameters, creating a PII/secrets leakage risk in
analytics. Create two helper functions in this file (sanitizeEndpoint and
sanitizeMessage) that remove query strings from endpoints and redact email
addresses and bearer tokens from messages using regex patterns, then apply these
functions to opts.endpoint and opts.message before passing them to the
trackVerifyFinding call.

In `@src/commands/verify/rls.ts`:
- Around line 47-67: The helper functions login, getAnonKey, rawsqlRows, and
recordsCount make live API calls without timeout or abort handling, which can
cause indefinite blocking if network calls hang and stall CI. For each of these
functions, implement AbortController with a reasonable timeout (e.g., 30
seconds), pass the abort signal to the fetch or HTTP calls, and add proper error
handling to catch and handle AbortError and timeout scenarios by throwing or
returning meaningful errors instead of hanging indefinitely.

In `@src/commands/verify/truth.ts`:
- Around line 30-42: The rawsqlRows() database call is executed before
validating that the required expectation flags are provided, causing unnecessary
database calls for invalid input. Move the validation check that throws CLIError
(checking that either opts.expectCount or opts.expect is defined) to execute
before the rawsqlRows() call. This ensures the function fails fast with
validation errors before attempting to query the database, avoiding wasted
resources on invalid requests.

In `@src/lib/analytics.ts`:
- Around line 149-167: The trackVerifyFinding function currently sends
potentially sensitive data in the endpoint and message fields, which can contain
emails, tokens, or query parameters. Instead of the current approach of
filtering safeEvidence and then sending all other fields, implement a whitelist
approach where you only include known-safe fields in the captureEvent call. Keep
only the safe structured fields like finding_type, passed, table, kind, and
status, and for the evidence data, explicitly whitelist only the safe evidence
keys (such as row count metrics) rather than filtering out unsafe ones. Remove
or transform the raw endpoint and message fields before sending them to avoid
leaking PII.

In `@src/lib/browser-mcp.ts`:
- Around line 90-92: The execAsync call in the MCP presence probe (checking for
MCP_SERVER_NAME using the "claude mcp get" command) lacks a timeout, which could
cause the install/link flow to hang indefinitely if the command becomes
unresponsive. Add a timeout option to the execAsync call to ensure it fails
after a reasonable duration (such as a few seconds) rather than waiting
indefinitely. This will allow the promise chain to proceed to the catch block if
the command times out.

In `@src/lib/skills.ts`:
- Around line 175-177: The log message in the clack.log.info call is too
assertive and inaccurate. The condition configured.length === 0 can result from
either no supported agent being found OR agent setup failure (with errors being
swallowed). Replace the message "No supported agent found to auto-configure the
browser MCP" with a non-assertive message that acknowledges both possibilities,
such as something that indicates the browser MCP could not be auto-configured
rather than definitively stating no agent was found. Keep the helpful Claude
configuration instruction at the end of the message.

In `@src/lib/verify-probe.ts`:
- Around line 99-145: Add timeout protection to prevent indefinite hangs in the
four fetch functions: login, getAnonKey, rawsqlRows, and recordsCount. For each
function, create an AbortController and set a timeout that aborts the fetch
operation if it exceeds a reasonable duration (e.g., 30 seconds). Pass the abort
signal to the fetch request via the signal option in the fetch options object,
and wrap each fetch call in a try-catch block to handle the AbortError that
occurs when the timeout fires, converting it to a meaningful error message that
includes the operation name and timeout details.
- Around line 61-67: The isReadOnlyQuery function allows dangerous SELECT ...
INTO statements that create or populate tables in PostgreSQL despite passing the
read-only check. Add a regex pattern to the guard that detects and rejects
queries containing the INTO keyword used in conjunction with SELECT statements,
similar to how the function currently rejects INSERT, UPDATE, DELETE, and other
write operations. This check should be added alongside the existing keyword
rejection pattern on line 66 to prevent database mutation through CREATE TABLE
AS SELECT or SELECT ... INTO constructs.

---

Nitpick comments:
In `@src/lib/verify-probe.test.ts`:
- Around line 52-78: The isReadOnlyQuery test suite does not have coverage for
PostgreSQL's SELECT ... INTO statement form which creates or modifies tables and
should be rejected. Add a new test case within the describe('isReadOnlyQuery')
block, after the existing rejection tests, that uses it() to test that
isReadOnlyQuery rejects various forms of SELECT ... INTO statements such as
SELECT ... INTO new_table, SELECT ... INTO TEMP table, and similar variations.
Ensure each expectation calls isReadOnlyQuery with these statements and expects
false to be returned.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1679be85-08fd-4ee4-a79c-3c4c21824d71

📥 Commits

Reviewing files that changed from the base of the PR and between 35d8c13 and ee76be5.

📒 Files selected for processing (12)
  • src/commands/deployments/deploy.ts
  • src/commands/verify/finding.ts
  • src/commands/verify/index.ts
  • src/commands/verify/rls.ts
  • src/commands/verify/truth.ts
  • src/index.ts
  • src/lib/analytics.ts
  • src/lib/browser-mcp.test.ts
  • src/lib/browser-mcp.ts
  • src/lib/skills.ts
  • src/lib/verify-probe.test.ts
  • src/lib/verify-probe.ts

Comment on lines +30 to +35
endpoint: opts.endpoint as string | undefined,
message: opts.message as string | undefined,
table: opts.table as string | undefined,
};
trackVerifyFinding(finding, config);
await shutdownAnalytics(); // flush the PostHog event before exit

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitize --message and --endpoint before tracking analytics.

Line 30 through Line 35 forwards raw free-form text to trackVerifyFinding(...). These values can include emails, tokens, or URL query params from runtime errors, which risks telemetry PII/secrets leakage. Redact sensitive patterns and strip query strings before emitting.

Suggested direction
-          endpoint: opts.endpoint as string | undefined,
-          message: opts.message as string | undefined,
+          endpoint: sanitizeEndpoint(opts.endpoint as string | undefined),
+          message: sanitizeMessage(opts.message as string | undefined),
// Add small helpers in this file:
function sanitizeEndpoint(v?: string): string | undefined {
  if (!v) return undefined;
  return v.split('?')[0]; // drop query params
}

function sanitizeMessage(v?: string): string | undefined {
  if (!v) return undefined;
  return v
    .replace(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}/gi, '[redacted-email]')
    .replace(/\b(?:Bearer\s+)?[A-Za-z0-9._-]{20,}\b/g, '[redacted-token]')
    .slice(0, 500);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/commands/verify/finding.ts` around lines 30 - 35, The endpoint and
message values passed to trackVerifyFinding are raw user input that may contain
sensitive data like emails, tokens, or URL query parameters, creating a
PII/secrets leakage risk in analytics. Create two helper functions in this file
(sanitizeEndpoint and sanitizeMessage) that remove query strings from endpoints
and redact email addresses and bearer tokens from messages using regex patterns,
then apply these functions to opts.endpoint and opts.message before passing them
to the trackVerifyFinding call.

Comment thread src/commands/verify/rls.ts Outdated
Comment thread src/commands/verify/truth.ts Outdated
Comment thread src/lib/analytics.ts Outdated
Comment thread src/lib/browser-mcp.ts
Comment on lines +90 to +92
const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`)
.then(() => true)
.catch(() => false);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a timeout to the Claude MCP presence probe.

Line 90 runs claude mcp get without a timeout. If that command hangs, the install/link flow can hang indefinitely.

Suggested fix
-      const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`)
+      const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`, {
+        timeout: MCP_CONFIG_TIMEOUT_MS,
+      })
         .then(() => true)
         .catch(() => false);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`)
.then(() => true)
.catch(() => false);
const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`, {
timeout: MCP_CONFIG_TIMEOUT_MS,
})
.then(() => true)
.catch(() => false);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/browser-mcp.ts` around lines 90 - 92, The execAsync call in the MCP
presence probe (checking for MCP_SERVER_NAME using the "claude mcp get" command)
lacks a timeout, which could cause the install/link flow to hang indefinitely if
the command becomes unresponsive. Add a timeout option to the execAsync call to
ensure it fails after a reasonable duration (such as a few seconds) rather than
waiting indefinitely. This will allow the promise chain to proceed to the catch
block if the command times out.

Comment thread src/lib/skills.ts
Comment on lines +175 to +177
clack.log.info(
'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx @playwright/mcp@latest --headless`.',
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a non-assertive message when configured is empty.

Line 175 currently states “No supported agent found,” but configured.length === 0 can also mean agent setup failed (errors are swallowed in src/lib/browser-mcp.ts Line 135).

Suggested fix
-          clack.log.info(
-            'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.',
-          );
+          clack.log.info(
+            'Could not auto-configure the browser MCP (no supported agent detected or configuration failed). Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.',
+          );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
clack.log.info(
'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx @playwright/mcp@latest --headless`.',
);
clack.log.info(
'Could not auto-configure the browser MCP (no supported agent detected or configuration failed). Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.',
);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/skills.ts` around lines 175 - 177, The log message in the
clack.log.info call is too assertive and inaccurate. The condition
configured.length === 0 can result from either no supported agent being found OR
agent setup failure (with errors being swallowed). Replace the message "No
supported agent found to auto-configure the browser MCP" with a non-assertive
message that acknowledges both possibilities, such as something that indicates
the browser MCP could not be auto-configured rather than definitively stating no
agent was found. Keep the helpful Claude configuration instruction at the end of
the message.

Comment thread src/lib/verify-probe.ts
Comment thread src/lib/verify-probe.ts

@jwfing jwfing left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review — feat/agent-e2e-verify (PoC)

Summary: A well-structured, defensively-coded PoC that adds hidden verify rls/truth/finding probes (pure, unit-tested verdict logic + thin fetch wiring), PII-stripping analytics, browser-MCP auto-setup, and Playwright deploy-exclude patterns — no blocking issues, a few convention/scope items worth addressing before this graduates from PoC.

Requirements context

No spec/plan under docs/specs/ matches this PR — the documents there cover diagnose and db-migrations, not agent-E2E/verify. Assessed against the PR description (cubic/Macroscope summaries) and the repo conventions in DEVELOPMENT.md. No invented requirements.


Critical

(none) — no correctness, security, data-loss, or behavior-breaking issues found.


Suggestion

Software engineering / telemetry — event name breaks the cli_<feature>_<action> convention. src/lib/analytics.ts:171 emits 'verify_finding', but every other helper in this file uses the cli_*_invoked shape (cli_diagnose_invoked, cli_config_invoked, …) per DEVELOPMENT.md §2 ("Use the cli_<feature>_<action> convention. Keep event names stable."). Recommend cli_verify_finding (or similar) for dashboard consistency. It's a new event so nothing breaks today, but the name should land right the first time.

Functionality / scope — installSkills(withBrowserMcp = true) is "opt-in" in the comment only. src/lib/skills.ts:88 defaults the new param to true and none of the six call sites (create.ts:443, link.ts:127/246/308/492/534) ever pass false, so the comment at skills.ts:156 ("Opt-in") doesn't match reality — every insforge link/create now mutates global, outside-the-project agent config (claude mcp add -s user, ~/.cursor/mcp.json, ~/.codex/config.toml). This also runs under --json (automation/CI) where output is suppressed but the writes still happen. The merges are careful (idempotent, malformed-JSON-safe, preserve other servers — nicely tested), so this is UX/scope rather than data-loss, but I'd wire a real opt-out (CLI flag or prompt) or make it genuinely opt-in before merge.

Security / testing — the redaction path has no direct tests. sanitizeEndpoint/sanitizeMessage (src/lib/analytics.ts:152-165) are the guard that keeps emails/tokens/PII out of PostHog, yet they have no unit tests, while the pure verdict helpers (classifyRls/classifyTruth/isReadOnlyQuery/isSafeIdentifier/isLikelyEmail) are thoroughly covered. The redaction regexes are exactly the kind of code a test should pin (e.g. token boundary, multiple emails, the 500-char truncation). Add cases for these.

Security — isReadOnlyQuery is keyword-based and won't catch side-effecting function calls. src/lib/verify-probe.ts (isReadOnlyQuery) blocks DML/DDL/chaining/INTO/CTE-DML well, but a query like SELECT some_mutating_fn() passes the guard and runs through runRawSql with the admin key (oss.ts:22, "superadmin access"). Acceptable for a hidden experimental probe, but worth a code comment acknowledging the gap (or routing reads through a lower-privilege path) before this is exposed more widely.


Information

  • Convention — direct fetch in verify-probe.ts (login, recordsCount) deviates from DEVELOPMENT.md §1 ("Do not call fetch directly … add a typed wrapper to oss.ts"). Here it's justified — RLS probing needs per-user/anon tokens, and ossFetch hard-codes the admin api_key Bearer (oss.ts:113-125), so it genuinely can't be reused. Consider a small typed wrapper or a note so the deviation is intentional-on-the-record.
  • Telemetry shapetrackVerifyFinding (analytics.ts:171) omits the standard project_name/org_id/region/oss_mode properties the other track* helpers all include; adding them keeps verify_finding groupable like the rest.
  • Convention — flush in finally. verify/{rls,truth,finding}.ts call await shutdownAnalytics() inline after the capture rather than in a finally block (DEVELOPMENT.md §2). Functionally fine (it immediately follows the only capture), just not the documented pattern.
  • SQL interpolation in rls.ts — the auth.users email lookup (rls.ts, select id from auth.users where email='...') interpolates after isLikelyEmail validation + single-quote escaping; safe given runRawSql has no parameter binding. Defense-in-depth is present; noted only because raw interpolation always deserves a second look.
  • agent-skills sync (DEVELOPMENT.md §3) — the new hidden verify commands and the browser-MCP behavior should be reflected in the insforge-verify skill in InsForge/agent-skills; the PR doesn't mention a companion update.
  • --expect-count is compared as a trimmed string (truth.tsclassifyTruth(rows.length, String(opts.expectCount))), so --expect-count 03 mismatches 3 rows. Minor.

Strengths worth calling out

  • The empty-token guard (rls.ts) and assertOk (verify-probe.ts) both close genuinely subtle false-pass traps — an empty token or a non-2xx being silently read as "0 rows / isolation holds." That's the right instinct for a verification tool.
  • Strong, well-tested input validation (isSafeIdentifier/isLikelyEmail/isReadOnlyQuery) and config-merge robustness (malformed/array/primitive JSON, idempotency).
  • Deploy exclude patterns (deploy.ts:53) are correct and scoped.

Verdict

approved (informational — no Critical findings; posted as a COMMENT, a human still gives the GitHub green-check via the approve flow). The Suggestions, especially the default-on global-config mutation and the untested redaction path, are worth resolving before this PoC graduates to a shipped feature.

Comment thread src/lib/verify-probe.ts
jwfing

This comment was marked as duplicate.

@jwfing jwfing left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary
This PR adds useful verify/MCP plumbing, but the current implementation has one security blocker and one core correctness blocker in the new verify commands.

Requirements context
I could not find repo docs that define verify beyond this PR, so I assessed intent primarily from the PR description and the inline command descriptions in src/commands/verify/*.ts. For project conventions, I also used DEVELOPMENT.md, especially the command/error/output and analytics guidance. I was not able to execute the Vitest suite in this workspace because vitest is not installed (sh: vitest: not found).

Findings

Critical

  • src/lib/verify-probe.ts:55-69, src/lib/api/oss.ts:13-29: verify truth is presented as a single read-only query, but the guard is only a regex over the SQL text before passing it to runRawSql, which executes with the project API key's superadmin privileges. That still allows side-effecting statements hidden behind SELECT/WITH via mutating SQL functions, so the command does not actually provide the safety guarantee it advertises.
  • src/commands/verify/rls.ts:18-20, src/commands/verify/rls.ts:60-65: the anonymous control is checking the entire table (recordsCount(..., undefined, undefined, anon)) instead of checking the same owner-scoped filter used for user A. Any table that intentionally exposes some public rows will be reported as rls_leak even when B and anonymous users cannot read A's rows, which breaks the command's stated behavior of verifying "B cannot read A, A can read own".

Suggestion

  • DEVELOPMENT.md:48-53, src/lib/analytics.ts:154-180, src/commands/verify/finding.ts:18-34: this change starts sending sanitized-but-still-free-form message and endpoint text to PostHog. That conflicts with the repo's telemetry guidance to never send user-entered free text, and the current redaction only covers query strings, emails, and long token-like substrings. I would strongly prefer reducing this event to structured fields (finding_type, kind, status, table, booleans/counts) and dropping the free-text properties entirely.
  • src/commands/verify/finding.ts:11-47, src/commands/verify/rls.ts:15-84, src/commands/verify/truth.ts:9-68, src/lib/skills.ts:160-188: the added tests cover the pure helpers well, but there is still no command-level coverage for the new hidden CLI surface or for the installSkills browser-MCP side effects. That leaves argument parsing, exit-code behavior, and telemetry/flush behavior unverified.

Information

  • (none)

Verdict
request_changes — the PR needs fixes for the read-only SQL guarantee and the anonymous RLS probe scope before it is safe to merge.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/verify/rls.test.ts`:
- Around line 51-56: The test assertions at lines 51-56 and 59-64 are too loose
- they only verify that an exit occurs with `/exit:/` but don't confirm the
rejection is specifically due to the `--owner` or `--user-a` validation guards.
Replace the generic `/exit:/` pattern matching in the toThrow() calls with more
specific validation error messages that confirm the actual validation failure
for each guard (the `--owner` parameter validation in the first test and
`--user-a` parameter validation in the second test) to ensure these tests are
hitting the intended rejection paths and not passing on unrelated early-exit
scenarios.
- Around line 70-75: The test currently validates only the 3rd recordsCount call
with a partial filter match using stringContaining, but the comment indicates
all three probes must use the same owner-scoped filter. Capture the filter
argument from the 1st call to recordsCount, then add assertions to verify that
both the 2nd and 3rd calls use the exact same filter string with strict equality
(not partial matching). This ensures consistency across all three probes as
intended.

In `@src/commands/verify/truth.test.ts`:
- Around line 75-80: The test case `flags false_pass (exit 1) when DB differs
from the claim` currently only validates the exit code but should also verify
analytics behavior on the failure path. Add expect assertions after the
parseAsync call to verify that trackVerifyFinding was called with the
appropriate finding details and that shutdownAnalytics was called to ensure
findings are properly emitted and analytics are properly flushed when
verification fails.
- Around line 59-63: The test for rejecting a non-integer --expect-count only
asserts that an error is thrown but does not verify that the database call to
runRawSql is prevented. Add a spy or mock for the runRawSql function before
calling makeProgram().parseAsync() and include an assertion that verifies
runRawSql was never called. This ensures the command validates the
--expect-count argument before attempting any database operations, preventing
unnecessary live-call leakage when invalid input is provided.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 438e9671-8b94-49dc-99c7-2e7080d1d9eb

📥 Commits

Reviewing files that changed from the base of the PR and between ee76be5 and 63b3147.

📒 Files selected for processing (7)
  • src/commands/verify/rls.test.ts
  • src/commands/verify/rls.ts
  • src/commands/verify/truth.test.ts
  • src/commands/verify/truth.ts
  • src/lib/analytics.ts
  • src/lib/verify-probe.test.ts
  • src/lib/verify-probe.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/commands/verify/rls.ts
  • src/commands/verify/truth.ts
  • src/lib/verify-probe.test.ts

Comment on lines +51 to +56
it('rejects an --owner that smuggles PostgREST params, before any login', async () => {
const { login } = await import('../../lib/verify-probe.js');
await expect(
makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id&select=secret', '--json'], { from: 'user' }),
).rejects.toThrow(/exit:/);
expect(login).not.toHaveBeenCalled();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten assertions to verify the intended rejection cause.

On Lines 53-56 and Lines 61-64, asserting only /exit:/ can pass on unrelated early-exit paths. Add an assertion on the emitted validation message so these tests prove they’re hitting the --owner and --user-a guards specifically.

Suggested test hardening
 describe('verify rls (command)', () => {
   let exitSpy: ReturnType<typeof vi.spyOn>;
+  let errorSpy: ReturnType<typeof vi.spyOn>;
   beforeEach(() => {
     vi.clearAllMocks();
     process.exitCode = undefined;
     exitSpy = vi.spyOn(process, 'exit').mockImplementation(((code?: number) => {
       throw new Error(`exit:${code}`);
     }) as never);
-    vi.spyOn(console, 'error').mockImplementation(() => {});
+    errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {});
     vi.spyOn(console, 'log').mockImplementation(() => {});
   });
@@
   it('rejects an --owner that smuggles PostgREST params, before any login', async () => {
     const { login } = await import('../../lib/verify-probe.js');
     await expect(
       makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id&select=secret', '--json'], { from: 'user' }),
     ).rejects.toThrow(/exit:/);
+    expect(errorSpy.mock.calls.flat().join(' ')).toMatch(/--owner must be a bare column name/);
     expect(login).not.toHaveBeenCalled();
   });
@@
   it('rejects a non-email --user-a, before any login', async () => {
     const { login } = await import('../../lib/verify-probe.js');
     await expect(
       makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--user-a', 'not-an-email', '--json'], { from: 'user' }),
     ).rejects.toThrow(/exit:/);
+    expect(errorSpy.mock.calls.flat().join(' ')).toMatch(/--user-a and --user-b must be valid email addresses/);
     expect(login).not.toHaveBeenCalled();
   });

Also applies to: 59-64

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/commands/verify/rls.test.ts` around lines 51 - 56, The test assertions at
lines 51-56 and 59-64 are too loose - they only verify that an exit occurs with
`/exit:/` but don't confirm the rejection is specifically due to the `--owner`
or `--user-a` validation guards. Replace the generic `/exit:/` pattern matching
in the toThrow() calls with more specific validation error messages that confirm
the actual validation failure for each guard (the `--owner` parameter validation
in the first test and `--user-a` parameter validation in the second test) to
ensure these tests are hitting the intended rejection paths and not passing on
unrelated early-exit scenarios.

Comment on lines +70 to +75
// 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.
expect(recordsCount).toHaveBeenCalledTimes(3);
// The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.
expect(recordsCount).toHaveBeenNthCalledWith(
3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate filter equality across all three recordsCount probes.

Line 70 says all probes must use the same owner-scoped filter, but the test currently checks only the 3rd call and only with stringContaining. Compare the 3rd argument from all three calls for exact equality.

Suggested assertion improvement
   it('scopes the anonymous control to A\'s owner filter (not the whole table)', async () => {
     const { recordsCount } = await import('../../lib/verify-probe.js');
     await makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--json'], { from: 'user' });
     // 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.
     expect(recordsCount).toHaveBeenCalledTimes(3);
+    const calls = (recordsCount as ReturnType<typeof vi.fn>).mock.calls;
+    const [filter1, filter2, filter3] = [calls[0][2], calls[1][2], calls[2][2]];
+    expect(typeof filter1).toBe('string');
+    expect(filter1).toMatch(/^user_id=eq\./);
+    expect(filter2).toBe(filter1);
+    expect(filter3).toBe(filter1);
     // The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.
     expect(recordsCount).toHaveBeenNthCalledWith(
       3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',
     );
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.
expect(recordsCount).toHaveBeenCalledTimes(3);
// The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.
expect(recordsCount).toHaveBeenNthCalledWith(
3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',
);
it('scopes the anonymous control to A\'s owner filter (not the whole table)', async () => {
const { recordsCount } = await import('../../lib/verify-probe.js');
await makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--json'], { from: 'user' });
// 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.
expect(recordsCount).toHaveBeenCalledTimes(3);
const calls = (recordsCount as ReturnType<typeof vi.fn>).mock.calls;
const [filter1, filter2, filter3] = [calls[0][2], calls[1][2], calls[2][2]];
expect(typeof filter1).toBe('string');
expect(filter1).toMatch(/^user_id=eq\./);
expect(filter2).toBe(filter1);
expect(filter3).toBe(filter1);
// The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.
expect(recordsCount).toHaveBeenNthCalledWith(
3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',
);
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/commands/verify/rls.test.ts` around lines 70 - 75, The test currently
validates only the 3rd recordsCount call with a partial filter match using
stringContaining, but the comment indicates all three probes must use the same
owner-scoped filter. Capture the filter argument from the 1st call to
recordsCount, then add assertions to verify that both the 2nd and 3rd calls use
the exact same filter string with strict equality (not partial matching). This
ensures consistency across all three probes as intended.

Comment thread src/commands/verify/truth.test.ts Outdated
Comment on lines +75 to +80
it('flags false_pass (exit 1) when DB differs from the claim', async () => {
const oss = await import('../../lib/api/oss.js');
(oss.runRawSql as ReturnType<typeof vi.fn>).mockResolvedValue({ rows: [{ n: 1 }] });
await makeProgram().parseAsync(['verify', 'truth', '--query', 'select n', '--expect', '3', '--json'], { from: 'user' });
expect(process.exitCode).toBe(1);
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify analytics behavior on the false_pass path, not just exit code.

Line 75 covers process.exitCode, but the PR contract also says findings are emitted and flushed on failures. Add trackVerifyFinding and shutdownAnalytics assertions here to lock that behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/commands/verify/truth.test.ts` around lines 75 - 80, The test case `flags
false_pass (exit 1) when DB differs from the claim` currently only validates the
exit code but should also verify analytics behavior on the failure path. Add
expect assertions after the parseAsync call to verify that trackVerifyFinding
was called with the appropriate finding details and that shutdownAnalytics was
called to ensure findings are properly emitted and analytics are properly
flushed when verification fails.

Comment thread src/commands/verify/truth.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread src/commands/verify/truth.test.ts Outdated

@jwfing jwfing left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary
This PR mostly implements the stated hidden verification probes and browser-MCP setup as described, with good input validation and telemetry scrubbing, and I found no blocking issues.

Requirements context
I derived intent primarily from the PR description because I did not find repo documentation for the new hidden verify surface itself; for repo conventions and telemetry expectations, I used DEVELOPMENT.md, especially the command-layout and PostHog guidance.

Findings
Critical
(none)

Suggestion

  • src/lib/browser-mcp.ts:104-116 — The Cursor/Codex auto-setup path is gated on ~/.cursor / ~/.codex already existing, even though mergeJsonMcp() and ensureCodexToml() already create parent directories themselves. In practice this means a fresh agent install can hit the “No supported agent found” path and never get the promised auto-configuration until the user has manually launched that agent once.
  • src/lib/browser-mcp.ts:17-18, src/lib/skills.ts:175-184 — The generated MCP config hardcodes npx @playwright/mcp@latest --headless. Using @latest makes every future agent session depend on whatever was most recently published upstream, which is both a supply-chain risk and a reproducibility risk for the verification flow this PR is trying to stabilize.
  • src/lib/browser-mcp.test.ts:1-109, src/lib/skills.ts:160-188 — The new tests cover the JSON/TOML helper functions, but not the actual configureBrowserMcp() / installSkills() integration path. That leaves the agent-detection logic and user-facing fallback behavior unverified, which is exactly where the .cursor / .codex existence-gate bug above lives.

Information

  • src/commands/verify/rls.ts:31-66, src/commands/verify/truth.ts:20-58, src/lib/analytics.ts:143-172 — The new verify commands do a solid job on the main security-sensitive areas: identifier/email validation before SQL or PostgREST usage, non-2xx handling for probe requests, and dropping free-text / sensitive evidence fields before PostHog capture.
  • src/commands/deployments/deploy.ts:27-52 — The deployment exclusion additions are low-risk and should reduce unnecessary upload size; I did not see a separate performance regression in the changed paths.

Verdict
approved — no Critical findings. I could not run the new test suites in this checkout because the workspace does not have dev dependencies installed (vitest: not found), so the review is based on static inspection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants