Skip to content
Closed
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c22a50b
[OOT HUD] Full pipeline: API endpoint, ClickHouse schema, replicator …
subinz1 Apr 24, 2026
d952d48
Sync reference implementation with RFC review feedback
subinz1 May 4, 2026
5a81a1c
Source downstream_repo_level from trusted payload
subinz1 May 12, 2026
d382149
Add check_run_id, run_id, schema_version and fix test-results key
subinz1 May 12, 2026
bd5268a
Replace hardware names with generic placeholders in schema comment
subinz1 May 12, 2026
5c278e2
Replace vendor name with generic placeholder in test params
subinz1 May 12, 2026
7a45ca7
Fix test_results key name and align with L2 summary-only approach
subinz1 May 12, 2026
34e23e2
Add downstream_repo_level to OOT Summary page
subinz1 May 13, 2026
e9f6292
Address review feedback: fix HTTP status code and remove dead code
subinz1 May 15, 2026
8fe2f2b
Extract shared conclusionColor/conclusionLabel to ootUtils
subinz1 May 15, 2026
5a7f77e
Remove unused Skeleton import from OotPrSection
subinz1 May 15, 2026
99f7986
Fix run_attempt type coercion and extract artifact_url
subinz1 May 15, 2026
3ac2556
Run prettier to fix formatting
subinz1 May 19, 2026
1d4abc5
Compute total_tests from passed+failed+skipped when total is absent
subinz1 May 19, 2026
eb25fb0
Fix prettier formatting in PR page OotPrSection
subinz1 May 20, 2026
be235b7
Address review: timing-safe auth and required field validation
subinz1 May 21, 2026
5b8f6df
Remove ClickHouse schema (moved to #8105)
subinz1 May 21, 2026
2fc8cd3
Add tests for OOT utils and results API handler
subinz1 May 21, 2026
118c376
Apply suggestion from @atalman
atalman May 21, 2026
6aadf46
Apply suggestion from @atalman
atalman May 21, 2026
7972e59
Apply suggestion from @atalman
atalman May 21, 2026
9b08de7
Apply suggestion from @atalman
atalman May 21, 2026
85afa77
Fix prettier formatting in OOT test files
subinz1 May 22, 2026
1df4769
Address review feedback from atalman and malfet
subinz1 May 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions aws/lambda/clickhouse-replicator-dynamo/lambda_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
"vllm-buildkite-agent-events": "vllm.vllm_buildkite_agents",
"vllm-buildkite-build-events": "vllm.vllm_buildkite_builds",
"vllm-buildkite-job-events": "vllm.vllm_buildkite_jobs",
"torchci-oot-workflow-job": "default.oot_workflow_job",
}


Expand Down
12 changes: 12 additions & 0 deletions torchci/clickhouse_queries/oot_backend_dashboard/params.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"params": {
"repo": "String",
"days": "UInt64"
},
"tests": [
{
"repo": "<company>/<repo>",
"days": "7"
}
]
}
29 changes: 29 additions & 0 deletions torchci/clickhouse_queries/oot_backend_dashboard/query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
SELECT
pr_number,
pytorch_head_sha,
workflow_name,
job_name,
check_run_id,
run_id,
run_attempt,
status,
conclusion,
started_at,
completed_at,
duration_seconds,
total_tests,
passed_tests,
failed_tests,
skipped_tests,
workflow_run_url,
artifact_url,
queue_time,
execution_time
FROM
default.oot_workflow_job FINAL
WHERE
downstream_repo = {repo: String}
AND started_at > now() - INTERVAL {days: UInt64} DAY
ORDER BY
started_at DESC
LIMIT 500
10 changes: 10 additions & 0 deletions torchci/clickhouse_queries/oot_pr_results/params.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"params": {
"pr": "UInt64"
},
"tests": [
{
"pr": "179565"
}
]
}
21 changes: 21 additions & 0 deletions torchci/clickhouse_queries/oot_pr_results/query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
SELECT
downstream_repo,
workflow_name,
job_name,
check_run_id,
run_id,
run_attempt,
status,
conclusion,
duration_seconds,
workflow_run_url,
artifact_url,
started_at,
queue_time,
execution_time
FROM
default.oot_workflow_job FINAL
WHERE
pr_number = {pr: UInt64}
ORDER BY
downstream_repo, started_at DESC

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR-scoped query is unbounded. The backend dashboard query uses LIMIT 500; a PR view should be tighter. Add a small cap so a PR with many retries can't blow up.

Suggested change
downstream_repo, started_at DESC
downstream_repo, started_at DESC
LIMIT 100

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added LIMIT 100 to the PR results query

10 changes: 10 additions & 0 deletions torchci/clickhouse_queries/oot_summary/params.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"params": {
"days": "UInt64"
},
"tests": [
{
"days": "7"
}
]
}
18 changes: 18 additions & 0 deletions torchci/clickhouse_queries/oot_summary/query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
SELECT
downstream_repo AS repo,
anyLast(downstream_repo_level) AS downstream_repo_level,
countIf(conclusion = 'success') AS successes,
countIf(conclusion = 'failure') AS failures,
count() AS total,
if(total > 0, successes / total, 0) AS pass_rate,
avg(duration_seconds) AS avg_duration_s,
max(started_at) AS last_run
FROM
default.oot_workflow_job FINAL
WHERE
started_at > now() - INTERVAL {days: UInt64} DAY
AND status = 'completed'
GROUP BY
repo
ORDER BY
pass_rate ASC
145 changes: 145 additions & 0 deletions torchci/components/oot/OotPrSection.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
import ExpandMoreIcon from "@mui/icons-material/ExpandMore";
import {
Accordion,
AccordionDetails,
AccordionSummary,
Chip,
Link,
Stack,
Table,
TableBody,
TableCell,
TableContainer,
TableHead,
TableRow,
Typography,
} from "@mui/material";
import { durationDisplay } from "components/common/TimeUtils";
import { fetcher } from "lib/GeneralUtils";
import { conclusionColor, conclusionLabel } from "lib/oot/ootUtils";
import useSWR from "swr";

interface OotPrResult {
downstream_repo: string;
workflow_name: string;
job_name: string;
check_run_id: string;
run_id: string;
run_attempt: number;
status: string;
conclusion: string;
duration_seconds: number;
workflow_run_url: string;
artifact_url: string;
started_at: string;
queue_time: number | null;
execution_time: number | null;
}

export default function OotPrSection({ prNumber }: { prNumber: number }) {
const url = `/api/clickhouse/oot_pr_results?parameters=${encodeURIComponent(
JSON.stringify({ pr: String(prNumber) })
)}`;
const { data, error } = useSWR<OotPrResult[]>(url, fetcher, {
refreshInterval: 60_000,
});

if (error || !data || data.length === 0) return null;

const successCount = data.filter(
(r) => r.status === "completed" && r.conclusion === "success"
).length;
const totalCompleted = data.filter((r) => r.status === "completed").length;
const inProgress = data.filter((r) => r.status === "in_progress").length;

const summaryText = [
totalCompleted > 0 ? `${successCount}/${totalCompleted} passed` : null,
inProgress > 0 ? `${inProgress} running` : null,
]
.filter(Boolean)
.join(", ");

return (
<Accordion defaultExpanded={false} sx={{ mt: 2 }}>
<AccordionSummary expandIcon={<ExpandMoreIcon />}>
<Stack direction="row" spacing={1} alignItems="center">
<Typography variant="subtitle1">
<strong>Out-of-Tree Backends</strong>
</Typography>
<Typography variant="body2" color="text.secondary">
({summaryText})
</Typography>
</Stack>
</AccordionSummary>
<AccordionDetails>
<TableContainer>
<Table size="small">
<TableHead>
<TableRow>
<TableCell>
<strong>Backend</strong>
</TableCell>
<TableCell>
<strong>Job</strong>
</TableCell>
<TableCell align="center">
<strong>Status</strong>
</TableCell>
<TableCell align="right">
<strong>Duration</strong>
</TableCell>
<TableCell>
<strong>Links</strong>
</TableCell>
</TableRow>
</TableHead>
<TableBody>
{data.map((row, i) => (
<TableRow key={i} hover>
<TableCell>{row.downstream_repo}</TableCell>
<TableCell>{row.job_name}</TableCell>
<TableCell align="center">
<Chip
label={conclusionLabel(row.status, row.conclusion)}
color={conclusionColor(row.status, row.conclusion)}
size="small"
/>
</TableCell>
<TableCell align="right">
{row.duration_seconds
? durationDisplay(Math.round(row.duration_seconds))
: "–"}
</TableCell>
<TableCell>
<Stack direction="row" spacing={1}>
{row.workflow_run_url && (
<Link
href={row.workflow_run_url}
target="_blank"
rel="noopener"
variant="body2"
>
Run
</Link>
)}
{row.artifact_url && (
<Link
href={row.artifact_url}
target="_blank"
rel="noopener"
variant="body2"
>
Artifacts
</Link>
)}
</Stack>
</TableCell>
</TableRow>
))}
</TableBody>
</Table>
</TableContainer>
</AccordionDetails>
</Accordion>
);
}
Loading
Loading