Skip to content

[ci-scanner] Improve workflow and fix detection auth#128125

Open
kotlarmilos wants to merge 7 commits into
mainfrom
kotlarmilos/ci-scanner-feedback
Open

[ci-scanner] Improve workflow and fix detection auth#128125
kotlarmilos wants to merge 7 commits into
mainfrom
kotlarmilos/ci-scanner-feedback

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented May 13, 2026

Description

This PR improves the workflow based on the past runs and fixes the security scanning requires review caution banner that has been appearing on every issue/PR the scanner produces.

Changes

  • Restructured body into a deterministic step-by-step flow with hard rules, branch decisions, literal inline templates, and an output discipline section
  • KBE template now includes the ## Error Details section that carries the build-analysis indicator block
  • Area-path assignment is delegated to the runtime labeler bot so each issue ends up with a single area path
  • Project linkage is left to the existing net-helix[bot] automation (which watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project within ~2s); the workflow only has to apply the label

Validation artifacts

Issues produced by workflow_dispatch runs of this branch:

- Restructure ci-failure-scan.md body into a deterministic step-by-step
  flow with hard rules, branch decisions, literal inline templates, and
  an output discipline section.
- Wire same-run project linkage for Known Build Error issues via
  update-project + temporary_id so Build Analysis picks them up.
- Add the Error Details template section so KBE issues carry the
  build-analysis indicator block.
- Delegate area path assignment to the runtime labeler bot (single area
  path per issue).
- Add an outage circuit breaker: when failures exceed the threshold,
  open one tracking issue and stop filing per-failure issues.
- Regenerate lock with gh-aw v0.71.5 and patch pat_pool into the
  detection job's needs list so the threat-detection PAT resolves
  correctly and the security-review caution banner stops appearing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

@kotlarmilos kotlarmilos changed the title [ci-scanner] address review feedback and fix detection auth [ci-scanner] Improve workflow and fix detection auth May 13, 2026
@kotlarmilos kotlarmilos marked this pull request as ready for review May 13, 2026 08:31
@kotlarmilos kotlarmilos requested review from a team and jeffhandley as code owners May 13, 2026 08:31
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CI Outer-Loop Failure Scanner’s gh-aw prompt/workflow to make triage output more deterministic, add same-run linkage of KBEs to a GitHub Project via a new update_project safe-output tool, and patch the generated lock workflow to fix threat-detection authentication.

Changes:

  • Rewrites .github/workflows/ci-failure-scan.md into an explicit step-by-step decision flow with stricter “hard rules”, updated templates (including ## Error Details), and an outage circuit breaker.
  • Adds update_project as an allowed safe-output operation and documents a temporary_id-based “create issue then attach to project” payload.
  • Updates the compiled .lock.yml workflow to include the new tool and to ensure the threat-detection job depends on pat_pool so its COPILOT token expression resolves.
Show a summary per file
File Description
.github/workflows/ci-failure-scan.md Major restructuring of the scanner prompt plus new templates and update_project safe-output integration guidance.
.github/workflows/ci-failure-scan.lock.yml Regenerated workflow + manual patch to threat-detection needs, plus wiring for update_project in safe-outputs configuration.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml
Same shape of bug as the detection patch: gh-aw v0.71.5 does not
auto-wire pat_pool into safe_outputs.needs even though the job's env
references needs.pat_pool.outputs.pat_number indirectly via the
update-project github-token. Without this, safe_outputs.update-project
fails with 'Input required and not supplied: github-token' because
secrets.COPILOT_GITHUB_TOKEN is empty/stale in this repo.

- Add pat_pool to safe_outputs.needs.
- Replace secrets.COPILOT_GITHUB_TOKEN with the same case() expression
  used in engine.env in three places inside the Process Safe Outputs
  step: handler config JSON, GH_AW_PROJECT_GITHUB_TOKEN env, and the
  github-script with: github-token input.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

…r project

The previous patch over-broadly switched the safe_outputs github-script
'with: github-token' to the pat_pool case() expression, which broke
create_issue and create_pull_request: those use the same octokit, and
the COPILOT_PAT_# tokens don't have repo issue/pr write scope for
dotnet/runtime.

- Restore with: github-token to secrets.GH_AW_GITHUB_TOKEN ||
  secrets.GITHUB_TOKEN (the standard pattern). The job-level
  permissions block grants issues:write / pull-requests:write so the
  workflow GITHUB_TOKEN can create issues and PRs.
- Keep GH_AW_PROJECT_GITHUB_TOKEN env on the case() expression so
  the update_project handler still has a PAT with project:write scope.
- Fix safe_outputs.permissions.issues regression from read to write
  (gh-aw dropped it to read when update-project was added; restoring to
  match upstream's pattern for create-issue safe-outputs).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 13, 2026 09:05
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 4

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated
Remove the outage circuit breaker step and its associated template.
The per-run trip thresholds were too aggressive in practice — any
significant outage immediately tripped them and produced one
consolidated tracking issue instead of per-failure KBEs, which is
not actionable. Renumber steps 5/6/7 to 4/5/6 accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

The org "Known Build Errors" project (dotnet/projects/111) has an
auto-add rule that pulls in any dotnet/runtime issue labeled
"Known Build Error". The workflow's update_project handler was
redundant with that rule and required a Projects v2 PAT scope the
agent's token pool intentionally does not have, so update_project
always failed at the GraphQL call.

Removing it:
- Drops the safe-outputs.update-project block from the workflow.
- Drops the temporary_id / same-run project linkage instructions from
  the agent prompt; project membership is achieved purely by the
  Known Build Error label.
- Cleans up the lock: no more GH_AW_PROJECT_GITHUB_TOKEN env, no more
  update_project handler config, and gh-aw now derives
  safe_outputs.permissions.issues: write and the correct needs graph
  on its own. Only one manual patch remains (detection.needs += pat_pool)
  with an inline explanatory comment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 13, 2026 09:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md
@github-actions

This comment has been minimized.

- Clarify Step 5 branching: A/B/D/E/F are mutually exclusive; Branch C is an additive refinement of Branch B (resolves the contradiction between 'Exactly one branch fires' and 'Branch C emits B outputs PLUS...').

- KBE title template now only allows test-failure / hang forms, removing the inconsistent non-test form that contradicted Hard rule #7.

- Document that net-helix[bot] is the agent that watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project; the workflow does not need to do anything beyond applying the label.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

Drop the build-break / infra tracking-issue branches and route every
actionable failure (test failure, hang, build break) through the same
KBE template. Build Analysis matches both shapes via the JSON body, so
a separate tracking-issue path added no value and produced issues that
were not picked up by the project board.

- Hard rule rewritten: every actionable failure becomes a Known Build
  Error issue; infra-only failures with no stable signature skip
  emission entirely.
- Step 3 reframed as log-extraction guidance only; deadletter and
  infra-shaped no-helix failures record 'skipped: infra noise — no
  stable signature' in the tally.
- Step 5 collapsed from A/B/C/D/E/F to A/B/C. Branch A now covers test
  failures and build breaks (stable = >= 2 occurrences in window OR a
  build break failing all legs of the current build). Branch B carves
  out build breaks (no muting path for compile errors). Branch C
  extended to mechanical build-break fixes.
- KBE title template adds a third form for build breaks.
- Weak signature now skips emission instead of falling through to a
  tracking issue.
- Tracking issue templates (generic + JIT pipeline) removed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 13, 2026 13:41
@github-actions
Copy link
Copy Markdown
Contributor

Caution

Security scanning requires review for Code Review

Details

The threat detection results could not be parsed. The workflow output should be reviewed before merging.

Review the workflow run logs for details.

🤖 Copilot Code Review — PR #128125

Note

This review was generated by Copilot.

Holistic Assessment

Motivation: The PR fixes a real auth issue (the "Security scanning requires review" caution banner appearing on every issue/PR the scanner produces) and restructures the workflow prompt for determinism. The linked validation issues (#128126, #128128, #128129) demonstrate the fix works. Justified.

Approach: The .lock.yml change is a targeted fix (adding pat_pool to needs: so the detection job can resolve PAT numbers). The .md restructuring into a step-by-step flow with hard rules, explicit templates, and an output discipline section is a sound improvement for agent determinism.

Summary: ✅ LGTM. The lock file fix is well-documented with a thorough comment explaining why it's needed and when it must be re-applied. The markdown restructuring preserves all prior logic while making it more prescriptive and less ambiguous. Previous review concerns have all been resolved.


Detailed Findings

✅ Lock file fix — Correct and well-documented

The pat_pool addition to needs: in the detection job (lines 14–36 of the lock diff) is necessary for needs.pat_pool.outputs.pat_number to resolve. The comment block thoroughly explains the failure chain (empty PAT → malformed auth header → 400s → no THREAT_DETECTION_RESULT → caution banner) and notes the re-application requirement after gh aw compile. This is the kind of documentation that prevents future confusion.

✅ Cron schedule change — Trivial

Moving from minute 31 to minute 34 is cosmetic (scatter adjustment). No impact.

✅ Markdown restructuring — Improved clarity

Key improvements:

  • Hard rules section at the top (caps, label restrictions, one-area-per-issue) replaces scattered guidance
  • Step-by-step flow (Steps 1–6) replaces prose that left ordering ambiguous
  • KBE template now includes ## Error Details section (human-readable excerpt alongside the machine-parsed JSON)
  • Branch A/B/C decision tree is clearer than the previous narrative
  • Area-path delegation to labeler bot and project linkage to net-helix[bot] are explicitly called out
  • Pipeline name fix: gcstress0x3-gcstress0xcgcstress-gcstress (matches actual ADO definition name)

✅ Previous review concerns addressed

All 9 prior review threads (from copilot-pull-request-reviewer) are resolved. The contradictions noted (KBE title guidance vs hard rules, Branch C wording) have been reconciled in the current revision — Branch C is now explicitly described as a "refinement of Branch B" and the KBE title template includes build-break form with clarification that the JSON body still carries the signature for Build Analysis.

💡 Minor observation — ## Error Details section placement

The new ## Error Details section in the KBE template is placed between ## Build Information and ## Error Message. This is fine for humans, but worth noting that if Build Analysis ever becomes sensitive to heading order or extra headings between Build Information and Error Message, this could interfere. Given that Build Analysis only parses the JSON fence under ## Error Message (not positionally), this is safe today. No action needed.

Generated by Code Review for issue #128125 · ● 2.6M ·

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.lock.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants