[ci-scanner] Improve workflow and fix detection auth by kotlarmilos · Pull Request #128125 · dotnet/runtime

kotlarmilos · 2026-05-13T08:28:13Z

Description

This PR improves the workflow based on the past runs and fixes the security scanning requires review caution banner that has been appearing on every issue/PR the scanner produces.

Changes

Restructured body into a deterministic step-by-step flow with hard rules, branch decisions, literal inline templates, and an output discipline section
KBE template now includes the ## Error Details section that carries the build-analysis indicator block
Area-path assignment is delegated to the runtime labeler bot so each issue ends up with a single area path
Project linkage is left to the existing net-helix[bot] automation (which watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project within ~2s); the workflow only has to apply the label

Validation artifacts

Issues produced by workflow_dispatch runs of this branch:

- Restructure ci-failure-scan.md body into a deterministic step-by-step flow with hard rules, branch decisions, literal inline templates, and an output discipline section. - Wire same-run project linkage for Known Build Error issues via update-project + temporary_id so Build Analysis picks them up. - Add the Error Details template section so KBE issues carry the build-analysis indicator block. - Delegate area path assignment to the runtime labeler bot (single area path per issue). - Add an outage circuit breaker: when failures exceed the threshold, open one tracking issue and stop filing per-failure issues. - Regenerate lock with gh-aw v0.71.5 and patch pat_pool into the detection job's needs list so the threat-detection PAT resolves correctly and the security-review caution banner stops appearing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-05-13T08:29:48Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR updates the CI Outer-Loop Failure Scanner’s gh-aw prompt/workflow to make triage output more deterministic, add same-run linkage of KBEs to a GitHub Project via a new update_project safe-output tool, and patch the generated lock workflow to fix threat-detection authentication.

Changes:

Rewrites .github/workflows/ci-failure-scan.md into an explicit step-by-step decision flow with stricter “hard rules”, updated templates (including ## Error Details), and an outage circuit breaker.
Adds update_project as an allowed safe-output operation and documents a temporary_id-based “create issue then attach to project” payload.
Updates the compiled .lock.yml workflow to include the new tool and to ensure the threat-detection job depends on pat_pool so its COPILOT token expression resolves.

Show a summary per file

File	Description
.github/workflows/ci-failure-scan.md	Major restructuring of the scanner prompt plus new templates and `update_project` safe-output integration guidance.
.github/workflows/ci-failure-scan.lock.yml	Regenerated workflow + manual patch to threat-detection `needs`, plus wiring for `update_project` in safe-outputs configuration.

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 3

Same shape of bug as the detection patch: gh-aw v0.71.5 does not auto-wire pat_pool into safe_outputs.needs even though the job's env references needs.pat_pool.outputs.pat_number indirectly via the update-project github-token. Without this, safe_outputs.update-project fails with 'Input required and not supplied: github-token' because secrets.COPILOT_GITHUB_TOKEN is empty/stale in this repo. - Add pat_pool to safe_outputs.needs. - Replace secrets.COPILOT_GITHUB_TOKEN with the same case() expression used in engine.env in three places inside the Process Safe Outputs step: handler config JSON, GH_AW_PROJECT_GITHUB_TOKEN env, and the github-script with: github-token input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…r project The previous patch over-broadly switched the safe_outputs github-script 'with: github-token' to the pat_pool case() expression, which broke create_issue and create_pull_request: those use the same octokit, and the COPILOT_PAT_# tokens don't have repo issue/pr write scope for dotnet/runtime. - Restore with: github-token to secrets.GH_AW_GITHUB_TOKEN || secrets.GITHUB_TOKEN (the standard pattern). The job-level permissions block grants issues:write / pull-requests:write so the workflow GITHUB_TOKEN can create issues and PRs. - Keep GH_AW_PROJECT_GITHUB_TOKEN env on the case() expression so the update_project handler still has a PAT with project:write scope. - Fix safe_outputs.permissions.issues regression from read to write (gh-aw dropped it to read when update-project was added; restoring to match upstream's pattern for create-issue safe-outputs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 4

Remove the outage circuit breaker step and its associated template. The per-run trip thresholds were too aggressive in practice — any significant outage immediately tripped them and produced one consolidated tracking issue instead of per-failure KBEs, which is not actionable. Renumber steps 5/6/7 to 4/5/6 accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The org "Known Build Errors" project (dotnet/projects/111) has an auto-add rule that pulls in any dotnet/runtime issue labeled "Known Build Error". The workflow's update_project handler was redundant with that rule and required a Projects v2 PAT scope the agent's token pool intentionally does not have, so update_project always failed at the GraphQL call. Removing it: - Drops the safe-outputs.update-project block from the workflow. - Drops the temporary_id / same-run project linkage instructions from the agent prompt; project membership is achieved purely by the Known Build Error label. - Cleans up the lock: no more GH_AW_PROJECT_GITHUB_TOKEN env, no more update_project handler config, and gh-aw now derives safe_outputs.permissions.issues: write and the correct needs graph on its own. Only one manual patch remains (detection.needs += pat_pool) with an inline explanatory comment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 2

- Clarify Step 5 branching: A/B/D/E/F are mutually exclusive; Branch C is an additive refinement of Branch B (resolves the contradiction between 'Exactly one branch fires' and 'Branch C emits B outputs PLUS...'). - KBE title template now only allows test-failure / hang forms, removing the inconsistent non-test form that contradicted Hard rule #7. - Document that net-helix[bot] is the agent that watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project; the workflow does not need to do anything beyond applying the label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Drop the build-break / infra tracking-issue branches and route every actionable failure (test failure, hang, build break) through the same KBE template. Build Analysis matches both shapes via the JSON body, so a separate tracking-issue path added no value and produced issues that were not picked up by the project board. - Hard rule rewritten: every actionable failure becomes a Known Build Error issue; infra-only failures with no stable signature skip emission entirely. - Step 3 reframed as log-extraction guidance only; deadletter and infra-shaped no-helix failures record 'skipped: infra noise — no stable signature' in the tally. - Step 5 collapsed from A/B/C/D/E/F to A/B/C. Branch A now covers test failures and build breaks (stable = >= 2 occurrences in window OR a build break failing all legs of the current build). Branch B carves out build breaks (no muting path for compile errors). Branch C extended to mechanical build-break fixes. - KBE title template adds a third form for build breaks. - Weak signature now skips emission instead of falling through to a tracking issue. - Tracking issue templates (generic + JIT pipeline) removed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-13T13:46:58Z

Caution

Security scanning requires review for Code Review

Details

The threat detection results could not be parsed. The workflow output should be reviewed before merging.

Review the workflow run logs for details.

🤖 Copilot Code Review — PR #128125

Note

This review was generated by Copilot.

Holistic Assessment

Motivation: The PR fixes a real auth issue (the "Security scanning requires review" caution banner appearing on every issue/PR the scanner produces) and restructures the workflow prompt for determinism. The linked validation issues (#128126, #128128, #128129) demonstrate the fix works. Justified.

Approach: The .lock.yml change is a targeted fix (adding pat_pool to needs: so the detection job can resolve PAT numbers). The .md restructuring into a step-by-step flow with hard rules, explicit templates, and an output discipline section is a sound improvement for agent determinism.

Summary: ✅ LGTM. The lock file fix is well-documented with a thorough comment explaining why it's needed and when it must be re-applied. The markdown restructuring preserves all prior logic while making it more prescriptive and less ambiguous. Previous review concerns have all been resolved.

Detailed Findings

✅ Lock file fix — Correct and well-documented

The pat_pool addition to needs: in the detection job (lines 14–36 of the lock diff) is necessary for needs.pat_pool.outputs.pat_number to resolve. The comment block thoroughly explains the failure chain (empty PAT → malformed auth header → 400s → no THREAT_DETECTION_RESULT → caution banner) and notes the re-application requirement after gh aw compile. This is the kind of documentation that prevents future confusion.

✅ Cron schedule change — Trivial

Moving from minute 31 to minute 34 is cosmetic (scatter adjustment). No impact.

✅ Markdown restructuring — Improved clarity

Key improvements:

Hard rules section at the top (caps, label restrictions, one-area-per-issue) replaces scattered guidance
Step-by-step flow (Steps 1–6) replaces prose that left ordering ambiguous
KBE template now includes ## Error Details section (human-readable excerpt alongside the machine-parsed JSON)
Branch A/B/C decision tree is clearer than the previous narrative
Area-path delegation to labeler bot and project linkage to net-helix[bot] are explicitly called out
Pipeline name fix: gcstress0x3-gcstress0xc → gcstress-gcstress (matches actual ADO definition name)

✅ Previous review concerns addressed

All 9 prior review threads (from copilot-pull-request-reviewer) are resolved. The contradictions noted (KBE title guidance vs hard rules, Branch C wording) have been reconciled in the current revision — Branch C is now explicitly described as a "refinement of Branch B" and the KBE title template includes build-break form with clarification that the JSON body still carries the signature for Build Analysis.

💡 Minor observation — `## Error Details` section placement

The new ## Error Details section in the KBE template is placed between ## Build Information and ## Error Message. This is fine for humans, but worth noting that if Build Analysis ever becomes sensitive to heading order or extra headings between Build Information and Error Message, this could interfere. Given that Build Analysis only parses the JSON fence under ## Error Message (not positionally), this is safe today. No action needed.

Generated by Code Review for issue #128125 · ● 2.6M · ◷

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 3

Copilot AI review requested due to automatic review settings May 13, 2026 08:28

Copilot started reviewing on behalf of kotlarmilos May 13, 2026 08:28 View session

github-actions Bot added the area-Infrastructure label May 13, 2026

dotnet-policy-service Bot assigned kotlarmilos May 13, 2026

github-project-automation Bot added this to Runtime Infra May 13, 2026

kotlarmilos requested review from JanKrivanek, JulieLeeMSFT, PureWeen, jeffschwMSFT, matouskozak and vitek-karas May 13, 2026 08:30

kotlarmilos changed the title ~~[ci-scanner] address review feedback and fix detection auth~~ [ci-scanner] Improve workflow and fix detection auth May 13, 2026

kotlarmilos marked this pull request as ready for review May 13, 2026 08:31

kotlarmilos requested review from a team and jeffhandley as code owners May 13, 2026 08:31