[ci-scanner] Improve workflow and fix detection auth#128125
[ci-scanner] Improve workflow and fix detection auth#128125kotlarmilos wants to merge 7 commits into
Conversation
- Restructure ci-failure-scan.md body into a deterministic step-by-step flow with hard rules, branch decisions, literal inline templates, and an output discipline section. - Wire same-run project linkage for Known Build Error issues via update-project + temporary_id so Build Analysis picks them up. - Add the Error Details template section so KBE issues carry the build-analysis indicator block. - Delegate area path assignment to the runtime labeler bot (single area path per issue). - Add an outage circuit breaker: when failures exceed the threshold, open one tracking issue and stop filing per-failure issues. - Regenerate lock with gh-aw v0.71.5 and patch pat_pool into the detection job's needs list so the threat-detection PAT resolves correctly and the security-review caution banner stops appearing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/runtime-infrastructure |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR updates the CI Outer-Loop Failure Scanner’s gh-aw prompt/workflow to make triage output more deterministic, add same-run linkage of KBEs to a GitHub Project via a new update_project safe-output tool, and patch the generated lock workflow to fix threat-detection authentication.
Changes:
- Rewrites
.github/workflows/ci-failure-scan.mdinto an explicit step-by-step decision flow with stricter “hard rules”, updated templates (including## Error Details), and an outage circuit breaker. - Adds
update_projectas an allowed safe-output operation and documents atemporary_id-based “create issue then attach to project” payload. - Updates the compiled
.lock.ymlworkflow to include the new tool and to ensure the threat-detection job depends onpat_poolso its COPILOT token expression resolves.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/ci-failure-scan.md | Major restructuring of the scanner prompt plus new templates and update_project safe-output integration guidance. |
| .github/workflows/ci-failure-scan.lock.yml | Regenerated workflow + manual patch to threat-detection needs, plus wiring for update_project in safe-outputs configuration. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 3
Same shape of bug as the detection patch: gh-aw v0.71.5 does not auto-wire pat_pool into safe_outputs.needs even though the job's env references needs.pat_pool.outputs.pat_number indirectly via the update-project github-token. Without this, safe_outputs.update-project fails with 'Input required and not supplied: github-token' because secrets.COPILOT_GITHUB_TOKEN is empty/stale in this repo. - Add pat_pool to safe_outputs.needs. - Replace secrets.COPILOT_GITHUB_TOKEN with the same case() expression used in engine.env in three places inside the Process Safe Outputs step: handler config JSON, GH_AW_PROJECT_GITHUB_TOKEN env, and the github-script with: github-token input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
…r project The previous patch over-broadly switched the safe_outputs github-script 'with: github-token' to the pat_pool case() expression, which broke create_issue and create_pull_request: those use the same octokit, and the COPILOT_PAT_# tokens don't have repo issue/pr write scope for dotnet/runtime. - Restore with: github-token to secrets.GH_AW_GITHUB_TOKEN || secrets.GITHUB_TOKEN (the standard pattern). The job-level permissions block grants issues:write / pull-requests:write so the workflow GITHUB_TOKEN can create issues and PRs. - Keep GH_AW_PROJECT_GITHUB_TOKEN env on the case() expression so the update_project handler still has a PAT with project:write scope. - Fix safe_outputs.permissions.issues regression from read to write (gh-aw dropped it to read when update-project was added; restoring to match upstream's pattern for create-issue safe-outputs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Remove the outage circuit breaker step and its associated template. The per-run trip thresholds were too aggressive in practice — any significant outage immediately tripped them and produced one consolidated tracking issue instead of per-failure KBEs, which is not actionable. Renumber steps 5/6/7 to 4/5/6 accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
The org "Known Build Errors" project (dotnet/projects/111) has an auto-add rule that pulls in any dotnet/runtime issue labeled "Known Build Error". The workflow's update_project handler was redundant with that rule and required a Projects v2 PAT scope the agent's token pool intentionally does not have, so update_project always failed at the GraphQL call. Removing it: - Drops the safe-outputs.update-project block from the workflow. - Drops the temporary_id / same-run project linkage instructions from the agent prompt; project membership is achieved purely by the Known Build Error label. - Cleans up the lock: no more GH_AW_PROJECT_GITHUB_TOKEN env, no more update_project handler config, and gh-aw now derives safe_outputs.permissions.issues: write and the correct needs graph on its own. Only one manual patch remains (detection.needs += pat_pool) with an inline explanatory comment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
- Clarify Step 5 branching: A/B/D/E/F are mutually exclusive; Branch C is an additive refinement of Branch B (resolves the contradiction between 'Exactly one branch fires' and 'Branch C emits B outputs PLUS...'). - KBE title template now only allows test-failure / hang forms, removing the inconsistent non-test form that contradicted Hard rule #7. - Document that net-helix[bot] is the agent that watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project; the workflow does not need to do anything beyond applying the label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Drop the build-break / infra tracking-issue branches and route every actionable failure (test failure, hang, build break) through the same KBE template. Build Analysis matches both shapes via the JSON body, so a separate tracking-issue path added no value and produced issues that were not picked up by the project board. - Hard rule rewritten: every actionable failure becomes a Known Build Error issue; infra-only failures with no stable signature skip emission entirely. - Step 3 reframed as log-extraction guidance only; deadletter and infra-shaped no-helix failures record 'skipped: infra noise — no stable signature' in the tally. - Step 5 collapsed from A/B/C/D/E/F to A/B/C. Branch A now covers test failures and build breaks (stable = >= 2 occurrences in window OR a build break failing all legs of the current build). Branch B carves out build breaks (no muting path for compile errors). Branch C extended to mechanical build-break fixes. - KBE title template adds a third form for build breaks. - Weak signature now skips emission instead of falling through to a tracking issue. - Tracking issue templates (generic + JIT pipeline) removed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Caution Security scanning requires review for Code Review DetailsThe threat detection results could not be parsed. The workflow output should be reviewed before merging. Review the workflow run logs for details. 🤖 Copilot Code Review — PR #128125Note This review was generated by Copilot. Holistic AssessmentMotivation: The PR fixes a real auth issue (the "Security scanning requires review" caution banner appearing on every issue/PR the scanner produces) and restructures the workflow prompt for determinism. The linked validation issues (#128126, #128128, #128129) demonstrate the fix works. Justified. Approach: The Summary: ✅ LGTM. The lock file fix is well-documented with a thorough comment explaining why it's needed and when it must be re-applied. The markdown restructuring preserves all prior logic while making it more prescriptive and less ambiguous. Previous review concerns have all been resolved. Detailed Findings✅ Lock file fix — Correct and well-documentedThe ✅ Cron schedule change — TrivialMoving from minute 31 to minute 34 is cosmetic (scatter adjustment). No impact. ✅ Markdown restructuring — Improved clarityKey improvements:
✅ Previous review concerns addressedAll 9 prior review threads (from 💡 Minor observation —
|
Description
This PR improves the workflow based on the past runs and fixes the security scanning requires review caution banner that has been appearing on every issue/PR the scanner produces.
Changes
## Error Detailssection that carries the build-analysis indicator blocknet-helix[bot]automation (which watches theKnown Build Errorlabel and adds matchingdotnet/runtimeissues to the Known Build Errors org project within ~2s); the workflow only has to apply the labelValidation artifacts
Issues produced by
workflow_dispatchruns of this branch: