Add pytest sharding core and timing plugin [CI 7/9]#1604
Open
merkelmarrow wants to merge 1 commit into
Open
Conversation
Contributor
Author
|
A note on tmp_path in this PR: the new tests specifcally test finn_ci, not real FINN builds, so it makes sense to me that you shouldn't need finn/FINN_BUILD_DIR to run these tests in particular (to check that your configs are correct). make_build_dir/robust_rmtree live in finn.util.basic, so using them here would couple this test suite to a full finn install, whereas finn_ci is deliberately importable without finn. Let me know if you think tmp_path isn't the right choice here |
Introduce a small finn_ci package, importable without the finn package installed, that becomes the single source of truth for the FINN CI matrix and backs a pytest sharding plugin: - config: the CI board (BOARDS) and per-row stage (STAGES tables plus helpers that the build pipeline derives from them) - sharding: weight-balanced group-to-shard assignment by longest-processing-time-first packing, degrading to round-robin - plugin: a pytest plugin that selects a shard by marker, keeps tests sharing an xdist_group on the same shard, and writes per-shard timing and shard-map files so later builds can balance by mesaured duration. The plugin does nothing unless a shard count is requested. Parametrise the BNN end2end matrix off the shared board table with stable, value-derived xdist_group names, so editing the matrix no longer renames unrelated groups or loses their timing history. The board list now lives in finn_ci.config (BOARDS and TEST_BOARDS) and the old test_board_map in finn.util.basic is removed. Group the ipstitch gen, stitch and rtlsim checkpoint chain per mem_mode so each step's output is on disk before the next test reads it. Add a stdlib JUnit failure printer for printing per-test failure context in CI logs, plus unit tests covering the config tables, shard assignment, the plugin under xdist, the JSON helper, and the pytest failure printer. Signed-off-by: Marco Blackwell <mblackwe@amd.com>
0a34113 to
b6fad24
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is PR 7 of 9 of a series intended to make CI faster and more robust.
This PR adds a new
ci/subdirectory and afinn_ciPython package inside it. This package is designed to let any marker stage be easily split into time-balanced shards. Note that PRs 8 and 9 wire the Jenkinsfiles into this package, so some of the package's surface won't be consumed until then (to keep this PR reviewable). This PR is safe to merge on its own because it does not affect the existing Jenkins pipeline and test changes do not change actual behaviour (only the way they are generated).Currently, FINN's Jenkins pipeline is a handful of long marker stages, and the best way to improve wall clock time is to split those stages between more workers (i.e. sharding). Doing that reliably requires:
A) a definition of the board and stage matrix that both the tests and the build pipeline agree on
B) a way to assign tests to shards that is deterministic across every worker (so xdist collections match)
C) a way for per-test timing data to feed the next run so that shards are balanced
A - Consolidated config
The "finn_ci" package now owns the CI board and stage tables and any derivations over such, eliminating duplication between pytest and Jenkinsfiles. These configurations are now validated directly by the tests themselves.
B & C - Sharding and timing
There were several approaches available for this. For instance, the simplest is round-robin allocation, but that produces uneven shards and longer overall wall-clock times. Another approach is to record the durations of tests manually, store those durations somewhere, and have your CI consume that file. The problem with this is that it goes stale quickly as tests are added, and someone needs to refresh the timings file.
There exist off-the-shelf options, but none were exactly what the problem needed. For instance, pytest-shard can't balance by durations/weights, and pytest-split doesn't have awareness of xidst_group, so it splits checkpoint chains that need to stay together.
I ended up going with writing a simple pytest sharding plugin that does the following:
1. Grouping: Before assigning, tests are bucketed into groups. Tests marked with the same
xdist_groupform one group, because they hand files to each other and must stay together.2. Selection: Every shard's pytest collects the same full set of tests for that marker. The plugin checks the config tables and a master timings file, and decides what tests this shard should keep, and throws away the rest. Each shard uses exactly the same calculation, so the assignment is deterministic.
3. Updating: Each group takes a certain amount of time. After a shard finishes, the plugin writes a small sidecar file next to the JUnit XML recording how long each group took. A full build later merges those into a persistent timing master file, which feeds step 2.
This is a self-healing system which rebalances as new tests are added, keeps grouped tests together, and falls back to simple round-robin on a cold start. The pytest plugin does nothing if the sharding flag isn't passed, so regular contributors won't experience any changes to local testing.
Other changes
test_end2end_bnn_pynq.pygenerated its test matrix in a fragile way, duplicating the board list inbasic.py, and the per-board markers were a third copy. In addition, because a group was named by its positioniin the generated list, inserting or removing a scenario would renumber every group after it (which would invalidate the new timing strategy). The board metadata now lives infinn_ci.config.BOARDS, and the test generates its matrix from that table.print_pytest_failures.pyis a new simple script that parses JUnit XML that has already been produced and prints a tail of test failures into the CI log, for observability. Never crashes a run.@pytest.mark.shard(N)to pin a test to a particular shard.