PoC: feat/agent-e2e-verify#170
Conversation
…fe names, resilient teardown
… keys, assert hidden via help
…tes, tolerate 404 on teardown, drop name from analytics
…validation, type create opts
…sforge-verify at link time
…eview-poc # Conflicts: # src/commands/projects/link.ts
…decisions, gotchas
…aywright-report, .playwright-mcp) from deploy uploads
…fix finding props
WalkthroughAdds an experimental ChangesVerify Probe Commands
Browser MCP Auto-configuration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
4 issues found across 12 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Greptile SummaryThis PR introduces a set of hidden experimental
Confidence Score: 4/5Safe to merge as an experimental hidden feature; the most sensitive paths (PII in telemetry, identifier injection, pre-I/O flag validation) have been addressed, with one residual gap in the finding recorder's free-text kind field. The verify commands are hidden and experimental, and this round has closed several earlier gaps. The remaining open items are the documented isReadOnlyQuery limitation (function-call mutations bypass the keyword guard, acknowledged in code comments) and the unvalidated --kind field in verify finding that reaches PostHog as free text. Neither blocks the PoC intent, but the telemetry gap is inconsistent with the stated no-free-text-to-PostHog contract established elsewhere in the same function. src/commands/verify/finding.ts (--kind free-text reaches PostHog); src/commands/verify/rls.ts (email still string-interpolated into admin-key SQL, mitigated by isLikelyEmail but not fully parameterised) Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Agent
participant CLI as insforge CLI
participant VProbe as verify-probe.ts
participant OSS as PostgREST/Auth API
participant PostHog
Note over Agent,PostHog: verify rls
Agent->>CLI: verify rls --table orders --owner user_id
CLI->>CLI: validate identifiers and emails
CLI->>OSS: login(userA) and login(userB)
OSS-->>CLI: aToken, bToken
CLI->>OSS: "runRawSql SELECT id FROM auth.users WHERE email=..."
OSS-->>CLI: aId UUID
CLI->>VProbe: recordsCount with bToken, aToken, undefined
VProbe->>OSS: GET records with owner filter x3
OSS-->>VProbe: row counts
CLI->>VProbe: classifyRls
VProbe-->>CLI: type and evidence
CLI->>PostHog: trackVerifyFinding row counts only
CLI-->>Agent: exit 0 or 1
Note over Agent,PostHog: verify truth
Agent->>CLI: verify truth --query SELECT n --expect 3
CLI->>CLI: isReadOnlyQuery and flag validation
CLI->>OSS: runRawSql with admin key
OSS-->>CLI: rows
CLI->>VProbe: classifyTruth
VProbe-->>CLI: type and evidence
CLI->>PostHog: trackVerifyFinding without db_actual or ui_claimed
CLI-->>Agent: exit 0 or 1
Note over Agent,PostHog: verify finding
Agent->>CLI: verify finding --kind http_500 --status 500
CLI->>PostHog: trackVerifyFinding kind and status only
CLI-->>Agent: exit 0
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Agent
participant CLI as insforge CLI
participant VProbe as verify-probe.ts
participant OSS as PostgREST/Auth API
participant PostHog
Note over Agent,PostHog: verify rls
Agent->>CLI: verify rls --table orders --owner user_id
CLI->>CLI: validate identifiers and emails
CLI->>OSS: login(userA) and login(userB)
OSS-->>CLI: aToken, bToken
CLI->>OSS: "runRawSql SELECT id FROM auth.users WHERE email=..."
OSS-->>CLI: aId UUID
CLI->>VProbe: recordsCount with bToken, aToken, undefined
VProbe->>OSS: GET records with owner filter x3
OSS-->>VProbe: row counts
CLI->>VProbe: classifyRls
VProbe-->>CLI: type and evidence
CLI->>PostHog: trackVerifyFinding row counts only
CLI-->>Agent: exit 0 or 1
Note over Agent,PostHog: verify truth
Agent->>CLI: verify truth --query SELECT n --expect 3
CLI->>CLI: isReadOnlyQuery and flag validation
CLI->>OSS: runRawSql with admin key
OSS-->>CLI: rows
CLI->>VProbe: classifyTruth
VProbe-->>CLI: type and evidence
CLI->>PostHog: trackVerifyFinding without db_actual or ui_claimed
CLI-->>Agent: exit 0 or 1
Note over Agent,PostHog: verify finding
Agent->>CLI: verify finding --kind http_500 --status 500
CLI->>PostHog: trackVerifyFinding kind and status only
CLI-->>Agent: exit 0
Reviews (6): Last reviewed commit: "fix(verify): validate --expect-count bef..." | Re-trigger Greptile |
…expect flags, Windows command lookup, drop PII from telemetry
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
src/lib/verify-probe.test.ts (1)
52-78: ⚡ Quick winAdd a regression for
SELECT ... INTOmutation.The SQL safety tests cover DML-in-CTE and statement chaining, but not PostgreSQL’s table-creating
SELECT ... INTOform. Add coverage alongside the read-only rejection cases so the admin-key guard cannot regress.🧪 Suggested test additions
it('rejects DML hidden inside a CTE (WITH … DELETE/UPDATE/INSERT/MERGE … SELECT)', () => { expect(isReadOnlyQuery('with x as (delete from users returning id) select id from x')).toBe(false); expect(isReadOnlyQuery('WITH x AS (UPDATE t SET c = 1 RETURNING id) SELECT * FROM x')).toBe(false); expect(isReadOnlyQuery('with x as (insert into t values (1) returning id) select id from x')).toBe(false); expect(isReadOnlyQuery('with x as (merge into t using s on t.id = s.id returning t.id) select id from x')).toBe(false); }); + + it('rejects SELECT INTO because it creates a table', () => { + expect(isReadOnlyQuery('select * into temp verify_probe_copy from users')).toBe(false); + expect(isReadOnlyQuery('with x as (select 1) select * into temp verify_probe_copy from x')).toBe(false); + }); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/verify-probe.test.ts` around lines 52 - 78, The isReadOnlyQuery test suite does not have coverage for PostgreSQL's SELECT ... INTO statement form which creates or modifies tables and should be rejected. Add a new test case within the describe('isReadOnlyQuery') block, after the existing rejection tests, that uses it() to test that isReadOnlyQuery rejects various forms of SELECT ... INTO statements such as SELECT ... INTO new_table, SELECT ... INTO TEMP table, and similar variations. Ensure each expectation calls isReadOnlyQuery with these statements and expects false to be returned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/commands/verify/finding.ts`:
- Around line 30-35: The endpoint and message values passed to
trackVerifyFinding are raw user input that may contain sensitive data like
emails, tokens, or URL query parameters, creating a PII/secrets leakage risk in
analytics. Create two helper functions in this file (sanitizeEndpoint and
sanitizeMessage) that remove query strings from endpoints and redact email
addresses and bearer tokens from messages using regex patterns, then apply these
functions to opts.endpoint and opts.message before passing them to the
trackVerifyFinding call.
In `@src/commands/verify/rls.ts`:
- Around line 47-67: The helper functions login, getAnonKey, rawsqlRows, and
recordsCount make live API calls without timeout or abort handling, which can
cause indefinite blocking if network calls hang and stall CI. For each of these
functions, implement AbortController with a reasonable timeout (e.g., 30
seconds), pass the abort signal to the fetch or HTTP calls, and add proper error
handling to catch and handle AbortError and timeout scenarios by throwing or
returning meaningful errors instead of hanging indefinitely.
In `@src/commands/verify/truth.ts`:
- Around line 30-42: The rawsqlRows() database call is executed before
validating that the required expectation flags are provided, causing unnecessary
database calls for invalid input. Move the validation check that throws CLIError
(checking that either opts.expectCount or opts.expect is defined) to execute
before the rawsqlRows() call. This ensures the function fails fast with
validation errors before attempting to query the database, avoiding wasted
resources on invalid requests.
In `@src/lib/analytics.ts`:
- Around line 149-167: The trackVerifyFinding function currently sends
potentially sensitive data in the endpoint and message fields, which can contain
emails, tokens, or query parameters. Instead of the current approach of
filtering safeEvidence and then sending all other fields, implement a whitelist
approach where you only include known-safe fields in the captureEvent call. Keep
only the safe structured fields like finding_type, passed, table, kind, and
status, and for the evidence data, explicitly whitelist only the safe evidence
keys (such as row count metrics) rather than filtering out unsafe ones. Remove
or transform the raw endpoint and message fields before sending them to avoid
leaking PII.
In `@src/lib/browser-mcp.ts`:
- Around line 90-92: The execAsync call in the MCP presence probe (checking for
MCP_SERVER_NAME using the "claude mcp get" command) lacks a timeout, which could
cause the install/link flow to hang indefinitely if the command becomes
unresponsive. Add a timeout option to the execAsync call to ensure it fails
after a reasonable duration (such as a few seconds) rather than waiting
indefinitely. This will allow the promise chain to proceed to the catch block if
the command times out.
In `@src/lib/skills.ts`:
- Around line 175-177: The log message in the clack.log.info call is too
assertive and inaccurate. The condition configured.length === 0 can result from
either no supported agent being found OR agent setup failure (with errors being
swallowed). Replace the message "No supported agent found to auto-configure the
browser MCP" with a non-assertive message that acknowledges both possibilities,
such as something that indicates the browser MCP could not be auto-configured
rather than definitively stating no agent was found. Keep the helpful Claude
configuration instruction at the end of the message.
In `@src/lib/verify-probe.ts`:
- Around line 99-145: Add timeout protection to prevent indefinite hangs in the
four fetch functions: login, getAnonKey, rawsqlRows, and recordsCount. For each
function, create an AbortController and set a timeout that aborts the fetch
operation if it exceeds a reasonable duration (e.g., 30 seconds). Pass the abort
signal to the fetch request via the signal option in the fetch options object,
and wrap each fetch call in a try-catch block to handle the AbortError that
occurs when the timeout fires, converting it to a meaningful error message that
includes the operation name and timeout details.
- Around line 61-67: The isReadOnlyQuery function allows dangerous SELECT ...
INTO statements that create or populate tables in PostgreSQL despite passing the
read-only check. Add a regex pattern to the guard that detects and rejects
queries containing the INTO keyword used in conjunction with SELECT statements,
similar to how the function currently rejects INSERT, UPDATE, DELETE, and other
write operations. This check should be added alongside the existing keyword
rejection pattern on line 66 to prevent database mutation through CREATE TABLE
AS SELECT or SELECT ... INTO constructs.
---
Nitpick comments:
In `@src/lib/verify-probe.test.ts`:
- Around line 52-78: The isReadOnlyQuery test suite does not have coverage for
PostgreSQL's SELECT ... INTO statement form which creates or modifies tables and
should be rejected. Add a new test case within the describe('isReadOnlyQuery')
block, after the existing rejection tests, that uses it() to test that
isReadOnlyQuery rejects various forms of SELECT ... INTO statements such as
SELECT ... INTO new_table, SELECT ... INTO TEMP table, and similar variations.
Ensure each expectation calls isReadOnlyQuery with these statements and expects
false to be returned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1679be85-08fd-4ee4-a79c-3c4c21824d71
📒 Files selected for processing (12)
src/commands/deployments/deploy.tssrc/commands/verify/finding.tssrc/commands/verify/index.tssrc/commands/verify/rls.tssrc/commands/verify/truth.tssrc/index.tssrc/lib/analytics.tssrc/lib/browser-mcp.test.tssrc/lib/browser-mcp.tssrc/lib/skills.tssrc/lib/verify-probe.test.tssrc/lib/verify-probe.ts
| endpoint: opts.endpoint as string | undefined, | ||
| message: opts.message as string | undefined, | ||
| table: opts.table as string | undefined, | ||
| }; | ||
| trackVerifyFinding(finding, config); | ||
| await shutdownAnalytics(); // flush the PostHog event before exit |
There was a problem hiding this comment.
Sanitize --message and --endpoint before tracking analytics.
Line 30 through Line 35 forwards raw free-form text to trackVerifyFinding(...). These values can include emails, tokens, or URL query params from runtime errors, which risks telemetry PII/secrets leakage. Redact sensitive patterns and strip query strings before emitting.
Suggested direction
- endpoint: opts.endpoint as string | undefined,
- message: opts.message as string | undefined,
+ endpoint: sanitizeEndpoint(opts.endpoint as string | undefined),
+ message: sanitizeMessage(opts.message as string | undefined),// Add small helpers in this file:
function sanitizeEndpoint(v?: string): string | undefined {
if (!v) return undefined;
return v.split('?')[0]; // drop query params
}
function sanitizeMessage(v?: string): string | undefined {
if (!v) return undefined;
return v
.replace(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}/gi, '[redacted-email]')
.replace(/\b(?:Bearer\s+)?[A-Za-z0-9._-]{20,}\b/g, '[redacted-token]')
.slice(0, 500);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/commands/verify/finding.ts` around lines 30 - 35, The endpoint and
message values passed to trackVerifyFinding are raw user input that may contain
sensitive data like emails, tokens, or URL query parameters, creating a
PII/secrets leakage risk in analytics. Create two helper functions in this file
(sanitizeEndpoint and sanitizeMessage) that remove query strings from endpoints
and redact email addresses and bearer tokens from messages using regex patterns,
then apply these functions to opts.endpoint and opts.message before passing them
to the trackVerifyFinding call.
| const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`) | ||
| .then(() => true) | ||
| .catch(() => false); |
There was a problem hiding this comment.
Add a timeout to the Claude MCP presence probe.
Line 90 runs claude mcp get without a timeout. If that command hangs, the install/link flow can hang indefinitely.
Suggested fix
- const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`)
+ const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`, {
+ timeout: MCP_CONFIG_TIMEOUT_MS,
+ })
.then(() => true)
.catch(() => false);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`) | |
| .then(() => true) | |
| .catch(() => false); | |
| const present = await execAsync(`claude mcp get ${MCP_SERVER_NAME}`, { | |
| timeout: MCP_CONFIG_TIMEOUT_MS, | |
| }) | |
| .then(() => true) | |
| .catch(() => false); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/browser-mcp.ts` around lines 90 - 92, The execAsync call in the MCP
presence probe (checking for MCP_SERVER_NAME using the "claude mcp get" command)
lacks a timeout, which could cause the install/link flow to hang indefinitely if
the command becomes unresponsive. Add a timeout option to the execAsync call to
ensure it fails after a reasonable duration (such as a few seconds) rather than
waiting indefinitely. This will allow the promise chain to proceed to the catch
block if the command times out.
| clack.log.info( | ||
| 'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx @playwright/mcp@latest --headless`.', | ||
| ); |
There was a problem hiding this comment.
Use a non-assertive message when configured is empty.
Line 175 currently states “No supported agent found,” but configured.length === 0 can also mean agent setup failed (errors are swallowed in src/lib/browser-mcp.ts Line 135).
Suggested fix
- clack.log.info(
- 'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.',
- );
+ clack.log.info(
+ 'Could not auto-configure the browser MCP (no supported agent detected or configuration failed). Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.',
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| clack.log.info( | |
| 'No supported agent found to auto-configure the browser MCP. Add it manually — Claude: `claude mcp add playwright -s user -- npx @playwright/mcp@latest --headless`.', | |
| ); | |
| clack.log.info( | |
| 'Could not auto-configure the browser MCP (no supported agent detected or configuration failed). Add it manually — Claude: `claude mcp add playwright -s user -- npx `@playwright/mcp`@latest --headless`.', | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/skills.ts` around lines 175 - 177, The log message in the
clack.log.info call is too assertive and inaccurate. The condition
configured.length === 0 can result from either no supported agent being found OR
agent setup failure (with errors being swallowed). Replace the message "No
supported agent found to auto-configure the browser MCP" with a non-assertive
message that acknowledges both possibilities, such as something that indicates
the browser MCP could not be auto-configured rather than definitively stating no
agent was found. Keep the helpful Claude configuration instruction at the end of
the message.
…, share oss runRawSql/getAnonKey
jwfing
left a comment
There was a problem hiding this comment.
Code review — feat/agent-e2e-verify (PoC)
Summary: A well-structured, defensively-coded PoC that adds hidden verify rls/truth/finding probes (pure, unit-tested verdict logic + thin fetch wiring), PII-stripping analytics, browser-MCP auto-setup, and Playwright deploy-exclude patterns — no blocking issues, a few convention/scope items worth addressing before this graduates from PoC.
Requirements context
No spec/plan under docs/specs/ matches this PR — the documents there cover diagnose and db-migrations, not agent-E2E/verify. Assessed against the PR description (cubic/Macroscope summaries) and the repo conventions in DEVELOPMENT.md. No invented requirements.
Critical
(none) — no correctness, security, data-loss, or behavior-breaking issues found.
Suggestion
Software engineering / telemetry — event name breaks the cli_<feature>_<action> convention. src/lib/analytics.ts:171 emits 'verify_finding', but every other helper in this file uses the cli_*_invoked shape (cli_diagnose_invoked, cli_config_invoked, …) per DEVELOPMENT.md §2 ("Use the cli_<feature>_<action> convention. Keep event names stable."). Recommend cli_verify_finding (or similar) for dashboard consistency. It's a new event so nothing breaks today, but the name should land right the first time.
Functionality / scope — installSkills(withBrowserMcp = true) is "opt-in" in the comment only. src/lib/skills.ts:88 defaults the new param to true and none of the six call sites (create.ts:443, link.ts:127/246/308/492/534) ever pass false, so the comment at skills.ts:156 ("Opt-in") doesn't match reality — every insforge link/create now mutates global, outside-the-project agent config (claude mcp add -s user, ~/.cursor/mcp.json, ~/.codex/config.toml). This also runs under --json (automation/CI) where output is suppressed but the writes still happen. The merges are careful (idempotent, malformed-JSON-safe, preserve other servers — nicely tested), so this is UX/scope rather than data-loss, but I'd wire a real opt-out (CLI flag or prompt) or make it genuinely opt-in before merge.
Security / testing — the redaction path has no direct tests. sanitizeEndpoint/sanitizeMessage (src/lib/analytics.ts:152-165) are the guard that keeps emails/tokens/PII out of PostHog, yet they have no unit tests, while the pure verdict helpers (classifyRls/classifyTruth/isReadOnlyQuery/isSafeIdentifier/isLikelyEmail) are thoroughly covered. The redaction regexes are exactly the kind of code a test should pin (e.g. token boundary, multiple emails, the 500-char truncation). Add cases for these.
Security — isReadOnlyQuery is keyword-based and won't catch side-effecting function calls. src/lib/verify-probe.ts (isReadOnlyQuery) blocks DML/DDL/chaining/INTO/CTE-DML well, but a query like SELECT some_mutating_fn() passes the guard and runs through runRawSql with the admin key (oss.ts:22, "superadmin access"). Acceptable for a hidden experimental probe, but worth a code comment acknowledging the gap (or routing reads through a lower-privilege path) before this is exposed more widely.
Information
- Convention — direct
fetchinverify-probe.ts(login,recordsCount) deviates fromDEVELOPMENT.md §1("Do not callfetchdirectly … add a typed wrapper tooss.ts"). Here it's justified — RLS probing needs per-user/anon tokens, andossFetchhard-codes the adminapi_keyBearer (oss.ts:113-125), so it genuinely can't be reused. Consider a small typed wrapper or a note so the deviation is intentional-on-the-record. - Telemetry shape —
trackVerifyFinding(analytics.ts:171) omits the standardproject_name/org_id/region/oss_modeproperties the othertrack*helpers all include; adding them keepsverify_findinggroupable like the rest. - Convention — flush in
finally.verify/{rls,truth,finding}.tscallawait shutdownAnalytics()inline after the capture rather than in afinallyblock (DEVELOPMENT.md §2). Functionally fine (it immediately follows the only capture), just not the documented pattern. - SQL interpolation in
rls.ts— theauth.usersemail lookup (rls.ts,select id from auth.users where email='...') interpolates afterisLikelyEmailvalidation + single-quote escaping; safe givenrunRawSqlhas no parameter binding. Defense-in-depth is present; noted only because raw interpolation always deserves a second look. - agent-skills sync (
DEVELOPMENT.md §3) — the new hiddenverifycommands and the browser-MCP behavior should be reflected in theinsforge-verifyskill inInsForge/agent-skills; the PR doesn't mention a companion update. --expect-countis compared as a trimmed string (truth.ts→classifyTruth(rows.length, String(opts.expectCount))), so--expect-count 03mismatches 3 rows. Minor.
Strengths worth calling out
- The empty-token guard (
rls.ts) andassertOk(verify-probe.ts) both close genuinely subtle false-pass traps — an empty token or a non-2xx being silently read as "0 rows / isolation holds." That's the right instinct for a verification tool. - Strong, well-tested input validation (
isSafeIdentifier/isLikelyEmail/isReadOnlyQuery) and config-merge robustness (malformed/array/primitive JSON, idempotency). - Deploy exclude patterns (
deploy.ts:53) are correct and scoped.
Verdict
approved (informational — no Critical findings; posted as a COMMENT, a human still gives the GitHub green-check via the approve flow). The Suggestions, especially the default-on global-config mutation and the untested redaction path, are worth resolving before this PoC graduates to a shipped feature.
…est redaction, numeric expect-count
jwfing
left a comment
There was a problem hiding this comment.
Summary
This PR adds useful verify/MCP plumbing, but the current implementation has one security blocker and one core correctness blocker in the new verify commands.
Requirements context
I could not find repo docs that define verify beyond this PR, so I assessed intent primarily from the PR description and the inline command descriptions in src/commands/verify/*.ts. For project conventions, I also used DEVELOPMENT.md, especially the command/error/output and analytics guidance. I was not able to execute the Vitest suite in this workspace because vitest is not installed (sh: vitest: not found).
Findings
Critical
src/lib/verify-probe.ts:55-69,src/lib/api/oss.ts:13-29:verify truthis presented as a single read-only query, but the guard is only a regex over the SQL text before passing it torunRawSql, which executes with the project API key's superadmin privileges. That still allows side-effecting statements hidden behindSELECT/WITHvia mutating SQL functions, so the command does not actually provide the safety guarantee it advertises.src/commands/verify/rls.ts:18-20,src/commands/verify/rls.ts:60-65: the anonymous control is checking the entire table (recordsCount(..., undefined, undefined, anon)) instead of checking the same owner-scoped filter used for user A. Any table that intentionally exposes some public rows will be reported asrls_leakeven when B and anonymous users cannot read A's rows, which breaks the command's stated behavior of verifying "B cannot read A, A can read own".
Suggestion
DEVELOPMENT.md:48-53,src/lib/analytics.ts:154-180,src/commands/verify/finding.ts:18-34: this change starts sending sanitized-but-still-free-formmessageandendpointtext to PostHog. That conflicts with the repo's telemetry guidance to never send user-entered free text, and the current redaction only covers query strings, emails, and long token-like substrings. I would strongly prefer reducing this event to structured fields (finding_type,kind,status,table, booleans/counts) and dropping the free-text properties entirely.src/commands/verify/finding.ts:11-47,src/commands/verify/rls.ts:15-84,src/commands/verify/truth.ts:9-68,src/lib/skills.ts:160-188: the added tests cover the pure helpers well, but there is still no command-level coverage for the new hidden CLI surface or for theinstallSkillsbrowser-MCP side effects. That leaves argument parsing, exit-code behavior, and telemetry/flush behavior unverified.
Information
- (none)
Verdict
request_changes — the PR needs fixes for the read-only SQL guarantee and the anonymous RLS probe scope before it is safe to merge.
…nest read-only guard wording, command tests
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/commands/verify/rls.test.ts`:
- Around line 51-56: The test assertions at lines 51-56 and 59-64 are too loose
- they only verify that an exit occurs with `/exit:/` but don't confirm the
rejection is specifically due to the `--owner` or `--user-a` validation guards.
Replace the generic `/exit:/` pattern matching in the toThrow() calls with more
specific validation error messages that confirm the actual validation failure
for each guard (the `--owner` parameter validation in the first test and
`--user-a` parameter validation in the second test) to ensure these tests are
hitting the intended rejection paths and not passing on unrelated early-exit
scenarios.
- Around line 70-75: The test currently validates only the 3rd recordsCount call
with a partial filter match using stringContaining, but the comment indicates
all three probes must use the same owner-scoped filter. Capture the filter
argument from the 1st call to recordsCount, then add assertions to verify that
both the 2nd and 3rd calls use the exact same filter string with strict equality
(not partial matching). This ensures consistency across all three probes as
intended.
In `@src/commands/verify/truth.test.ts`:
- Around line 75-80: The test case `flags false_pass (exit 1) when DB differs
from the claim` currently only validates the exit code but should also verify
analytics behavior on the failure path. Add expect assertions after the
parseAsync call to verify that trackVerifyFinding was called with the
appropriate finding details and that shutdownAnalytics was called to ensure
findings are properly emitted and analytics are properly flushed when
verification fails.
- Around line 59-63: The test for rejecting a non-integer --expect-count only
asserts that an error is thrown but does not verify that the database call to
runRawSql is prevented. Add a spy or mock for the runRawSql function before
calling makeProgram().parseAsync() and include an assertion that verifies
runRawSql was never called. This ensures the command validates the
--expect-count argument before attempting any database operations, preventing
unnecessary live-call leakage when invalid input is provided.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 438e9671-8b94-49dc-99c7-2e7080d1d9eb
📒 Files selected for processing (7)
src/commands/verify/rls.test.tssrc/commands/verify/rls.tssrc/commands/verify/truth.test.tssrc/commands/verify/truth.tssrc/lib/analytics.tssrc/lib/verify-probe.test.tssrc/lib/verify-probe.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- src/commands/verify/rls.ts
- src/commands/verify/truth.ts
- src/lib/verify-probe.test.ts
| it('rejects an --owner that smuggles PostgREST params, before any login', async () => { | ||
| const { login } = await import('../../lib/verify-probe.js'); | ||
| await expect( | ||
| makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id&select=secret', '--json'], { from: 'user' }), | ||
| ).rejects.toThrow(/exit:/); | ||
| expect(login).not.toHaveBeenCalled(); |
There was a problem hiding this comment.
Tighten assertions to verify the intended rejection cause.
On Lines 53-56 and Lines 61-64, asserting only /exit:/ can pass on unrelated early-exit paths. Add an assertion on the emitted validation message so these tests prove they’re hitting the --owner and --user-a guards specifically.
Suggested test hardening
describe('verify rls (command)', () => {
let exitSpy: ReturnType<typeof vi.spyOn>;
+ let errorSpy: ReturnType<typeof vi.spyOn>;
beforeEach(() => {
vi.clearAllMocks();
process.exitCode = undefined;
exitSpy = vi.spyOn(process, 'exit').mockImplementation(((code?: number) => {
throw new Error(`exit:${code}`);
}) as never);
- vi.spyOn(console, 'error').mockImplementation(() => {});
+ errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {});
vi.spyOn(console, 'log').mockImplementation(() => {});
});
@@
it('rejects an --owner that smuggles PostgREST params, before any login', async () => {
const { login } = await import('../../lib/verify-probe.js');
await expect(
makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id&select=secret', '--json'], { from: 'user' }),
).rejects.toThrow(/exit:/);
+ expect(errorSpy.mock.calls.flat().join(' ')).toMatch(/--owner must be a bare column name/);
expect(login).not.toHaveBeenCalled();
});
@@
it('rejects a non-email --user-a, before any login', async () => {
const { login } = await import('../../lib/verify-probe.js');
await expect(
makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--user-a', 'not-an-email', '--json'], { from: 'user' }),
).rejects.toThrow(/exit:/);
+ expect(errorSpy.mock.calls.flat().join(' ')).toMatch(/--user-a and --user-b must be valid email addresses/);
expect(login).not.toHaveBeenCalled();
});Also applies to: 59-64
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/commands/verify/rls.test.ts` around lines 51 - 56, The test assertions at
lines 51-56 and 59-64 are too loose - they only verify that an exit occurs with
`/exit:/` but don't confirm the rejection is specifically due to the `--owner`
or `--user-a` validation guards. Replace the generic `/exit:/` pattern matching
in the toThrow() calls with more specific validation error messages that confirm
the actual validation failure for each guard (the `--owner` parameter validation
in the first test and `--user-a` parameter validation in the second test) to
ensure these tests are hitting the intended rejection paths and not passing on
unrelated early-exit scenarios.
| // 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter. | ||
| expect(recordsCount).toHaveBeenCalledTimes(3); | ||
| // The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter. | ||
| expect(recordsCount).toHaveBeenNthCalledWith( | ||
| 3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon', | ||
| ); |
There was a problem hiding this comment.
Validate filter equality across all three recordsCount probes.
Line 70 says all probes must use the same owner-scoped filter, but the test currently checks only the 3rd call and only with stringContaining. Compare the 3rd argument from all three calls for exact equality.
Suggested assertion improvement
it('scopes the anonymous control to A\'s owner filter (not the whole table)', async () => {
const { recordsCount } = await import('../../lib/verify-probe.js');
await makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--json'], { from: 'user' });
// 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter.
expect(recordsCount).toHaveBeenCalledTimes(3);
+ const calls = (recordsCount as ReturnType<typeof vi.fn>).mock.calls;
+ const [filter1, filter2, filter3] = [calls[0][2], calls[1][2], calls[2][2]];
+ expect(typeof filter1).toBe('string');
+ expect(filter1).toMatch(/^user_id=eq\./);
+ expect(filter2).toBe(filter1);
+ expect(filter3).toBe(filter1);
// The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter.
expect(recordsCount).toHaveBeenNthCalledWith(
3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon',
);
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter. | |
| expect(recordsCount).toHaveBeenCalledTimes(3); | |
| // The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter. | |
| expect(recordsCount).toHaveBeenNthCalledWith( | |
| 3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon', | |
| ); | |
| it('scopes the anonymous control to A\'s owner filter (not the whole table)', async () => { | |
| const { recordsCount } = await import('../../lib/verify-probe.js'); | |
| await makeProgram().parseAsync(['verify', 'rls', '--table', 'orders', '--owner', 'user_id', '--json'], { from: 'user' }); | |
| // 3 probes: B-of-A, A-own, anon — all must use the same owner-scoped filter. | |
| expect(recordsCount).toHaveBeenCalledTimes(3); | |
| const calls = (recordsCount as ReturnType<typeof vi.fn>).mock.calls; | |
| const [filter1, filter2, filter3] = [calls[0][2], calls[1][2], calls[2][2]]; | |
| expect(typeof filter1).toBe('string'); | |
| expect(filter1).toMatch(/^user_id=eq\./); | |
| expect(filter2).toBe(filter1); | |
| expect(filter3).toBe(filter1); | |
| // The anon probe (3rd call) must pass the filter + no token, NOT undefined for the filter. | |
| expect(recordsCount).toHaveBeenNthCalledWith( | |
| 3, 'https://h', 'orders', expect.stringContaining('user_id=eq.'), undefined, 'anon', | |
| ); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/commands/verify/rls.test.ts` around lines 70 - 75, The test currently
validates only the 3rd recordsCount call with a partial filter match using
stringContaining, but the comment indicates all three probes must use the same
owner-scoped filter. Capture the filter argument from the 1st call to
recordsCount, then add assertions to verify that both the 2nd and 3rd calls use
the exact same filter string with strict equality (not partial matching). This
ensures consistency across all three probes as intended.
| it('flags false_pass (exit 1) when DB differs from the claim', async () => { | ||
| const oss = await import('../../lib/api/oss.js'); | ||
| (oss.runRawSql as ReturnType<typeof vi.fn>).mockResolvedValue({ rows: [{ n: 1 }] }); | ||
| await makeProgram().parseAsync(['verify', 'truth', '--query', 'select n', '--expect', '3', '--json'], { from: 'user' }); | ||
| expect(process.exitCode).toBe(1); | ||
| }); |
There was a problem hiding this comment.
Verify analytics behavior on the false_pass path, not just exit code.
Line 75 covers process.exitCode, but the PR contract also says findings are emitted and flushed on failures. Add trackVerifyFinding and shutdownAnalytics assertions here to lock that behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/commands/verify/truth.test.ts` around lines 75 - 80, The test case `flags
false_pass (exit 1) when DB differs from the claim` currently only validates the
exit code but should also verify analytics behavior on the failure path. Add
expect assertions after the parseAsync call to verify that trackVerifyFinding
was called with the appropriate finding details and that shutdownAnalytics was
called to ensure findings are properly emitted and analytics are properly
flushed when verification fails.
There was a problem hiding this comment.
1 issue found across 7 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
… no DB call on invalid input
jwfing
left a comment
There was a problem hiding this comment.
Summary
This PR mostly implements the stated hidden verification probes and browser-MCP setup as described, with good input validation and telemetry scrubbing, and I found no blocking issues.
Requirements context
I derived intent primarily from the PR description because I did not find repo documentation for the new hidden verify surface itself; for repo conventions and telemetry expectations, I used DEVELOPMENT.md, especially the command-layout and PostHog guidance.
Findings
Critical
(none)
Suggestion
src/lib/browser-mcp.ts:104-116— The Cursor/Codex auto-setup path is gated on~/.cursor/~/.codexalready existing, even thoughmergeJsonMcp()andensureCodexToml()already create parent directories themselves. In practice this means a fresh agent install can hit the “No supported agent found” path and never get the promised auto-configuration until the user has manually launched that agent once.src/lib/browser-mcp.ts:17-18,src/lib/skills.ts:175-184— The generated MCP config hardcodesnpx @playwright/mcp@latest --headless. Using@latestmakes every future agent session depend on whatever was most recently published upstream, which is both a supply-chain risk and a reproducibility risk for the verification flow this PR is trying to stabilize.src/lib/browser-mcp.test.ts:1-109,src/lib/skills.ts:160-188— The new tests cover the JSON/TOML helper functions, but not the actualconfigureBrowserMcp()/installSkills()integration path. That leaves the agent-detection logic and user-facing fallback behavior unverified, which is exactly where the.cursor/.codexexistence-gate bug above lives.
Information
src/commands/verify/rls.ts:31-66,src/commands/verify/truth.ts:20-58,src/lib/analytics.ts:143-172— The new verify commands do a solid job on the main security-sensitive areas: identifier/email validation before SQL or PostgREST usage, non-2xx handling for probe requests, and dropping free-text / sensitive evidence fields before PostHog capture.src/commands/deployments/deploy.ts:27-52— The deployment exclusion additions are low-risk and should reduce unnecessary upload size; I did not see a separate performance regression in the changed paths.
Verdict
approved — no Critical findings. I could not run the new test suites in this checkout because the workspace does not have dev dependencies installed (vitest: not found), so the review is based on static inspection.
Summary by cubic
Adds hidden verify probes and auto-setup for the
@playwright/mcpbrowser server to power light‑mode agent E2E verification with analytics-backed findings. Also excludes Playwright artifacts from deployment uploads.New Features
verifyCLI withrls,truth, andfinding; each emitscli_verify_findingand sets pass/fail exit codes.@playwright/mcpserver (npx @playwright/mcp@latest --headless) for Claude Code, Cursor, and Codex.rls/truth.Bug Fixes
test-results,playwright-report,.playwright-mcp.verify truth: block destructive DML/DDL (incl. CTE/MERGEandSELECT INTO), prevent statement chaining, treat non‑2xx as errors; enforce exclusive--expectvs--expect-count; validate--expect-countbefore executing to avoid DB calls on invalid input.endpoint/message; also stripdb_actual/ui_claimed), scope the anonymous RLS control to the owner filter, validate identifiers/emails, and improve Windows agent detection for MCP setup.Written for commit 0bdb097. Summary will update on new commits.
Note
Add hidden
verifyCLI commands for RLS isolation and backend truth probesverifycommand group with three subcommands:verify rls,verify truth, andverify finding, registered in src/index.ts.verify rlsprobes row-level security by performing cross-user and anonymous PostgREST reads, classifying results asrls_leak,rls_overrestrict, or pass, and setting a non-zero exit code on failure.verify truthruns a read-only SQL query and compares the result to a UI-claimed value or row count, recording afalse_passfinding on mismatch.verify findingrecords an arbitrary loud error (4xx/5xx, console error) as an analytics finding.trackVerifyFindingin src/lib/analytics.ts, which sanitizes endpoints and messages and strips sensitive evidence fields before sending.installSkills, the Playwright browser MCP is now configured for supported agents (Claude Code, Cursor, Codex) and users see a restart notice.installSkillsnow accepts an optional third parameterwithBrowserMcp(defaulttrue) and may write to agent config files on disk during skill installation.Changes since #170 opened
trackVerifyFindingto include project and organization context [a309bd5]--expect-countCLI flag to enforce non-negative integer values [a309bd5]verify.registerVerifyRlsCommandto use the same owner-scoped filter string when callingrecordsCountinstead of passing undefined [63b3147]endpointandmessageproperties fromanalytics.trackVerifyFindingevent payload and deletedsanitizeEndpointandsanitizeMessagefunctions [63b3147]verify rlsandverify truthcommands with comprehensive test coverage [63b3147]verify.registerVerifyTruthCommandto clarify that the guard blocks common destructive query forms but does not provide hard read-only guarantees [63b3147]verify-probe.isReadOnlyQueryhelper to clarify limitations of the read-only guard [63b3147]registerVerifyTruthCommandaction handler requiring either--expector--expect-countflags and validating that--expect-countis a non-negative integer before executing database queries [0bdb097]--expect-countvalues to assert thatrunRawSqlis not invoked when validation fails [0bdb097]Macroscope summarized 008fdd5.
Summary by CodeRabbit
Release Notes