Skip to content

[CIP-14][CELEBORN-2220] Support Slow Start Push in C++ Client#3730

Open
afterincomparableyum wants to merge 1 commit into
apache:mainfrom
afterincomparableyum:CELEBORN-2220
Open

[CIP-14][CELEBORN-2220] Support Slow Start Push in C++ Client#3730
afterincomparableyum wants to merge 1 commit into
apache:mainfrom
afterincomparableyum:CELEBORN-2220

Conversation

@afterincomparableyum

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Implement SlowStartPushStrategy in the C++ client, mirroring the Java client's TCP-like congestion control for push speed:

  • Track a per worker CongestControlContext that grows the in-flight request limit on success (slow start, then congestion avoidance once the limit reaches the threshold) and halves it on congestion, falling back to sleeping when congestion persists at a limit of 1.

  • Select the strategy through celeborn.client.push.limitStrategy=SLOWSTART

  • Add configs celeborn.client.push.slowStart.initialSleepTime (default 500ms) and celeborn.client.push.slowStart.maxSleepTime (default 2s) to control sleep intervals while ramping up.

  • Add PushStrategyTest covering strategy creation, sleep time calculation, congestion control, and per host isolation, and add to CelebornConfTest with the new configs.

Why are the changes needed?

This is needed to bridge the gap between features that the java client has vs what the c++ client has.

Does this PR resolve a correctness bug?

  • Yes

Does this PR introduce any user-facing change?

  • Yes

How was this patch tested?

CI/CD

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.47%. Comparing base (3820244) to head (f36a0c3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3730      +/-   ##
============================================
- Coverage     57.53%   57.47%   -0.05%     
  Complexity      214      214              
============================================
  Files           396      396              
  Lines         27857    27857              
  Branches       2710     2710              
============================================
- Hits          16025    16009      -16     
- Misses        10682    10696      +14     
- Partials       1150     1152       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@afterincomparableyum afterincomparableyum marked this pull request as ready for review June 13, 2026 16:13
Implement SlowStartPushStrategy in the C++ client, mirroring the Java client's TCP-like congestion control for push speed:

- Track a per worker CongestControlContext that grows the in-flight request limit on success (slow start, then congestion avoidance once the limit reaches the threshold) and halves it on congestion, falling back to sleeping when congestion persists at a limit of 1.

- Select the strategy through celeborn.client.push.limitStrategy=SLOWSTART

- Add configs celeborn.client.push.slowStart.initialSleepTime (default 500ms) and celeborn.client.push.slowStart.maxSleepTime (default 2s) to control sleep intervals while ramping up.

- Add PushStrategyTest covering strategy creation, sleep time calculation, congestion control, and per host isolation, and add to CelebornConfTest with the new configs.
@afterincomparableyum

Copy link
Copy Markdown
Contributor Author

Ping @SteNicholas @RexXiong could one of y'all take a look please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant