[CRCR] Implement CRCR upstream check run management for L3/L4 jobs #8119
[CRCR] Implement CRCR upstream check run management for L3/L4 jobs #8119can-gaa-hou wants to merge 4 commits into
Conversation
|
@can-gaa-hou is attempting to deploy a commit to the Meta Open Source Team on Vercel. A member of the Team first needs to authorize it. |
8fee6c5 to
30e0f25
Compare
8f62972 to
d1cb42c
Compare
| logger.info( | ||
| "l3_labeled: no job info for repo=%s; check run marked wanted for callback", | ||
| downstream_repo, | ||
| ) |
There was a problem hiding this comment.
upstream_token is minted for config.upstream_repo (always pytorch/pytorch), but it's called inside the for downstream_repo in l3_repos loop. If a device maps to multiple repos, this creates redundant installation tokens for the same upstream repo on every iteration.
Consider hoisting it above the loop:
upstream_token = gh_helper.get_repo_access_token(
config.github_app_id,
config.github_app_private_key,
config.upstream_repo,
)
created: list[str] = []
for downstream_repo in l3_repos:
...This also gives a clean fail-fast — if the token mint fails, it would fail identically on every iteration anyway, so there's no benefit to retrying it per-repo inside the loop.
There was a problem hiding this comment.
Good catch! Fixed with lazy mint since when job_info is None (mostly happens when the label is added before the workflow starts), we don't even need to call the GitHub API.
2a7dbf8 to
6b40dc7
Compare
…ent handling for check_run and check_suite events
…PR labeled handler
Summary
Architecture
webhookfunction:mark_check_run_wantedand let thecallbacklambda create this check run.callbacklambda will cache workflow information beforehand so it can immediately create thein_progresscheck run.OOT_STATUS_TTL), it will create acompletedcheck run immediately. Otherwise, it will not create a check run.callbackto check whether a check run is needed.callbackfunction:is_check_run_wantedto see if this PR needs a check run. If so, immediately create one.webhook.Changes
Verification
We performed the following scenario verification on our AWS Lambda instance:
L3:
ciflow/crcr/{device}are added immediately after the PR is created, and show up in the corresponding check-run on the PR with the namecrcr/{repo}/{workflow_name}.in_progresscheck-run.completedcheck-run.L4:
Re-run
Re-runbutton in each failed check run will trigger the corresponding downstream workflow to re-run and update the check run status toin_progress.Re-run all jobsorRe-run all failed jobsbutton will trigger the corresponding downstream workflows in the check suite and update the corresponding check run status toin_progress.Unit Tests
TODO
cc @albanD @fffrog @KarhouTam @atalman @huydhn @zxiiro @subinz1 @jewelkm89