ci: verify L3 gate fixes e2e (throwaway — DO NOT MERGE)#815
Closed
Yiminnn wants to merge 1 commit into
Closed
GitHub Actions / integration-final-review
succeeded
Jun 20, 2026 in 0s
integration-final-review: mergeable with quarantines
Verdict
mergeable with quarantines
Blockers
- none
Coverage
| cell | task | agent | sandbox | skill_mode | status | detail |
|---|---|---|---|---|---|---|
| citation-check-docker-no-skill-openhands | citation-check | openhands | docker | no-skill | healthy | all gates green |
Slots: healthy=1, unhealthy=0, missing=0, duplicate=0, stale=0 (planned=1)
Evidence
- citation-check-docker-no-skill-openhands (healthy)
- root:
jobs/integration-final/citation-check-docker-no-skill-openhands/2026-06-20__00-12-03/citation-check__79c1c5fc - gates: R-REAL=pass, R-OUTCOME=pass, R-ARTIFACT=pass, R-TELEMETRY=pass, S-NOSKILL=na, S-WITHSKILL=na, V-TAMPER=pass, C-ATTRIB=na, V-LIFECYCLE=na, V-ENVHARDEN=na, V-REWARDHACK=na
- rerun:
python tests/integration/rubric_checks.py jobs/integration-final/citation-check-docker-no-skill-openhands/2026-06-20__00-12-03/citation-check__79c1c5fc --json
- root:
Parity:
- pinned-baseline pinned-baseline: fail — pinned-baseline parity FAIL: harbor: baseline git HEAD 3459996 does not match pinned ref 2d86fe82f6a06f7c7b3a22a3ae90d554d0e9655c; harbor: /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash/2026-06-08__pr2-pr3-selected-3trial/citation-check__pr2-fill5-c10-noskills-7a4c6d6d-0004/result.json: missing Harbor field(s): ['agent_info', 'config', 'verifier_result']; harbor: /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash/2026-06-08__pr2-pr3-selected-3trial/citation-check__pr2-fill5-c10-noskills-9c44b8b1-0003/result.json: missing Harbor field(s): ['agent_info', 'config', 'verifier_result']; harbor: /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash/2026-06-08__pr2-pr3-selected-3trial/citation-check__pr2-fill5-c10-noskills-aeadd837-0002/result.json: missing Harbor field(s): ['agent_info', 'config', 'verifier_result']; harbor: no matching result.json files under /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash; harbor: missing baseline result for citation-check; no overlapping SkillsBench tasks to compare
Residual risk
- QUARANTINE: parity pinned-baseline (advisory — gate needs native-baseline mode): pinned-baseline — pinned-baseline parity FAIL: harbor: baseline git HEAD 3459996 does not match pinned ref 2d86fe82f6a06f7c7b3a22a3ae90d554d0e9655c; harbor: /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash/2026-06-08__pr2-pr3-selected-3trial/citation-check__pr2-fill5-c10-noskills-7a4c6d6d-0004/result.json: missing Harbor field(s): ['agent_info', 'config', 'verifier_result']; harbor: /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash/2026-06-08__pr2-pr3-selected-3trial/citation-check__pr2-fill5-c10-noskills-9c44b8b1-0003/result.json: missing Harbor field(s): ['agent_info', 'config', 'verifier_result']; harbor: /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash/2026-06-08__pr2-pr3-selected-3trial/citation-check__pr2-fill5-c10-noskills-aeadd837-0002/result.json: missing Harbor field(s): ['agent_info', 'config', 'verifier_result']; harbor: no matching result.json files under /home/runner/work/benchflow/benchflow/baseline-root/submissions/skillsbench/v1.1/openhands-no-skills__deepseek-v4-flash; harbor: missing baseline result for citation-check; no overlapping SkillsBench tasks to compare
- residual (from plan): light lane: no full agent-matrix coverage; lifecycle/hardening rely on residual + codex review
- V-LIFECYCLE / V-ENVHARDEN / V-REWARDHACK: codex/residual review (never faked deterministically)
Required reruns
- none
Loading