fix(SESF-44): tighten Tier-2 auth keyword + reject non-literal assignment values#42
Conversation
Cover the author*/authority/co-author FP family, $(...) command-sub and
${...} env-interpolation FP values, and genuine-key true positives
(incl. bare oauth) through both redact() and scan_spans(). 10 RED, 8 green.
D-1: drop bare `auth` (substring-matched author*/authority/co-author git
metadata); add left-anchored `(?<![A-Za-z])(?:oauth|authorization|auth[_-]?token)`
so `oauth` no longer matches inside `coauthor`/`coAuthoredBy`.
D-2: new pure `_is_nonliteral_assignment(line, match)` drops command-sub
`$(...)` and env-interpolation `${...}` matches (incl. the keyword-inside-braces
case where the captured value is the default, not `${...}`).
Both fixes sit upstream of the shared aggregation, so redact() (SESF-41 guard)
and scan_spans() (SESF-42 sanitizer) both benefit. 378 passed, ruff clean.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughTightens Tier-2 ASSIGNMENT detection in ChangesSESF-44 Tier-2 Assignment Precision
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | ✅ 18 (≤ 500 complexity) |
| Duplication | ✅ 0 (≤ 5 duplication) |
AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Pull Request Overview
The pull request effectively refines the Tier-2 secret detector to reduce false positives by tightening the 'auth' keyword regex and introducing logic to ignore non-literal assignments (command substitutions and environment interpolations). Codacy analysis indicates the changes are up to standards.
However, a logic issue in secret_redaction.py was identified. An 'if' statement intended to catch non-literal values is unreachable because the regex used to capture the assignment value explicitly excludes curly braces. This should be corrected to ensure command substitutions are correctly ignored as per the acceptance criteria.
Test suggestions
- Author family keys (author, authoredDate, coauthor, etc.) with literal values are not flagged.
- Genuine keys (oauth, authorization, auth_token, authToken) with literal values are still flagged.
- Command substitution values like $(security ...) are ignored.
- Environment variable interpolation like ${FOO_PASSWORD:-default} is ignored even when the key matches.
- Simple environment variables like ${VALUE} are ignored.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
There was a problem hiding this comment.
Pull Request Overview
This PR aims to refine secret detection by narrowing the 'auth' keyword scope and ignoring non-literal assignments. However, two critical issues prevent approval: the production code changes (secret_redaction.py) are missing from the provided diff, and the logic to exclude the 'author' family currently risks a regression by dropping the bare 'auth' keyword entirely.
While the Codacy analysis is positive, the test suite requires expansion. Specifically, the implementation should be verified against more shell-style variations (quotes, backticks) and ensure parity between JSON and shell-assignment formats. The regex implementation should also be checked for case-sensitivity once the source code is provided.
About this PR
- The actual implementation changes in
secret_redaction.py(including the regex updates and the_is_nonliteral_assignmentfunction) are missing from the provided diff. Only the test file is visible. Please ensure all modified files are properly staged and pushed so the logic can be verified.
Test suggestions
- Verify 'author' family keys with literal values do not trigger Tier-2 hits.
- Verify command substitution values (e.g., $(security ...)) are not flagged.
- Verify environment variable interpolation values (e.g., ${VAR:-default}) are not flagged.
- Verify genuine keys like 'oauth' and 'authorization' with literal values are still flagged.
- Verify that placeholder values (e.g., 'none') for genuine keys remain unflagged.
- Verify quoted command substitutions (e.g., KEY="$(command)") are not flagged.
- Verify shell backticks (e.g., KEY=
command) are not flagged. - Verify additional interpolation operators (e.g., ${VAR:?error}) are not flagged.
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Verify quoted command substitutions (e.g., KEY="$(command)") are not flagged.
2. Verify shell backticks (e.g., KEY=`command`) are not flagged.
3. Verify additional interpolation operators (e.g., ${VAR:?error}) are not flagged.
Low confidence findings
- Once the production code is available, please verify that the regex
(?<![A-Za-z])(?:oauth|authorization|auth[_-]?token)handles case-insensitivity correctly (e.g., for 'authToken') to ensure it meets the requirements for diverse coding styles.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@secret_redaction.py`:
- Around line 103-112: The _SECRET_KEYWORD regex pattern definition uses a
Bandit suppression comment (# nosec B105) on line 103, but Ruff is still
flagging this code with the S105 error on the subsequent lines containing the
keyword patterns. Replace the Bandit-specific # nosec B105 comment with Ruff's
compatible suppression format (# noqa: S105) so that the security linting check
is properly suppressed for this intentional detector regex.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bcf6144a-0cc0-457a-8fd6-db77bf4aa7ae
📒 Files selected for processing (2)
secret_redaction.pytests/test_secret_redaction.py
- Remove unreachable `value.startswith("${")` branch in _is_nonliteral_assignment
(the value capture class excludes `{`, so ${ can never start a captured value;
interpolation is handled by the span-overlap check). [codacy 3410428483]
- Add `# noqa: S105` on the keyword string so CodeRabbit's S-enabled ruff passes
(project ruff is D1-only; this is keyword NAMES, not a credential). [coderabbit 3410431432]
- Tests: add shell-form parity to the AC-1 author-family cases, a quoted
command-substitution case, and a ${VAR:?error} interpolation case. [codacy 3410428669]
378 passed, ruff clean (incl. S105).
Summary
Precision tune of the Tier-2 contextual-assignment detector in
secret_redaction.py. Eliminates the two false-positive classes that dominated the SESF-42 live baseline (~47% of 882 Tier-2 hits), with no loss of recall on genuine secrets.auth(it substring-matched theauthor*/authority/ co-author git-metadata family) and add a left-anchored auth family(?<![A-Za-z])(?:oauth|authorization|auth[_-]?token). The lookbehind stopsoauthfrom re-matching insidecoauthor/coAuthoredBy;auth_tokenstill matches via the existing baretokenkeyword._is_nonliteral_assignment(line, match)drops command-substitution$(…)and env-interpolation${…}matches — including the case where the secret keyword is matched inside the braces (${FOO_PASSWORD:-default}→ keyFOO_PASSWORD, value-default), which a value-shape check alone misses.Both fixes sit upstream of the shared
_aggregate_maskable_candidates, so they benefit bothredact()(SESF-41 ingestion guard) andscan_spans()(SESF-42 sanitizer) with zero call-site changes.Test plan
tests/test_secret_redaction.py, each asserted through bothredact()andscan_spans():author/authored/authoredDate/authorAssociation/authority/gradingAuthority/coauthor/coAuthoredBy→ not flagged$(security …)and${FOO_PASSWORD:-default}→ not flaggedauthorization/bareoauth/oauth_token/auth_token/auth-token/authToken+ literalFOO_API_KEY→ still flaggedLinks
DocVault/Projects/SessionFlow/sketches/SESF-44-tier2-assignment-precision/(requirements → discovery → approach → tasks, all Codex-reviewed + reconciled)🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
${...}interpolation patterns (including default/error forms).Tests