Support multiple replica sizes per scale in cluster spec sheet#36604
Open
antiguru wants to merge 1 commit into
Open
Support multiple replica sizes per scale in cluster spec sheet#36604antiguru wants to merge 1 commit into
antiguru wants to merge 1 commit into
Conversation
… cc sizes CloudTarget now exposes a list of equivalent (cc, M.1) sizes per replica scale, and the strong/weak scaling runners iterate over both so each scale point produces a data point on both pricing schemes. The cluster result analyzer learns to map M.1 names to credits/hour and worker counts for plotting alongside cc sizes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The cluster spec sheet benchmarking tool currently tests only a single replica size per scale. This change enables testing multiple replica sizes at each scale, allowing side-by-side comparison of equivalent cluster configurations (e.g.,
NccvsM.1-*sizes with the same worker count).Description
This PR refactors the replica size selection logic to support multiple sizes per scale:
API Change: Renamed
replica_size_for_scale()toreplica_sizes_for_scale()which now returnslist[str]instead of a single string. Added a newreplica_size_for_scale()convenience method that returns the first (default) size for backward compatibility with setup-only clusters.ManagedTarget Implementation: Updated to return both
Nccand equivalentM.1-*sizes where available. AddedM1_REPLICA_SIZESmapping that pairs each scale with its corresponding M.1 tier based on worker count equivalence (e.g., scale 2 →M.1-smallwith 4 workers).DockerTarget Implementation: Updated to return a single-element list for consistency.
Scenario Execution: Modified
run_scenario_strong()andrun_scenario_weak()to iterate over all returned replica sizes, running each scenario once per size. This allows direct performance comparison between equivalent cluster configurations.Analysis: Enhanced
analyze_cluster_results_file()with M.1 size metadata (credits/hour and worker counts) to properly normalize results across different cluster types when analyzing benchmark data.Verification
The changes maintain backward compatibility through the new
replica_size_for_scale()convenience method. Existing tests and benchmarks will continue to work. The new functionality is exercised by the scenario execution loops which now iterate over multiple sizes per scale.https://claude.ai/code/session_016ELD471W47RyJdWuoLrUc1