-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Run the cudf-polars test suite against DaskEngine and RayEngine
#22381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 23 commits
1b165a8
7cdedd0
8feaf42
e26e24b
26daa0e
b294bf8
e5427fa
75d234c
41d71b3
9a05eb8
0c26c09
f3dc183
1996620
f2005f3
673df5b
27b631c
d280fab
5c4d952
8c1819f
d3904b7
6482434
f79fb71
cd78518
56fc913
88f6b5d
734df2e
efc904b
83059fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -384,6 +384,14 @@ files: | |
| key: experimental | ||
| includes: | ||
| - run_cudf_polars_experimental | ||
| py_run_cudf_polars_ray: | ||
| output: pyproject | ||
| pyproject_dir: python/cudf_polars | ||
| extras: | ||
| table: project.optional-dependencies | ||
| key: ray | ||
| includes: | ||
| - depends_on_ray | ||
| py_test_cudf_polars: | ||
| output: pyproject | ||
| pyproject_dir: python/cudf_polars | ||
|
|
@@ -1290,6 +1298,11 @@ dependencies: | |
| - matrix: | ||
| packages: | ||
| - *rapidsmpf_unsuffixed | ||
| depends_on_ray: | ||
| common: | ||
| - output_types: [conda, requirements, pyproject] | ||
| packages: | ||
| - ray>=2.55.1 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In run_constraints:
- ray >=2.55.1to mirror this constraint
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ray is still optional, this is just for the CI testing
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, IIRC https://rattler-build.prefix.dev/latest/reference/recipe_file/#run-constraints
But can be done in a follow up since the CI is currently green
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I didn't know that, thanks.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure thing, opened #22414 |
||
| depends_on_rapids_logger: | ||
| common: | ||
| - output_types: [conda, requirements, pyproject] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -164,17 +164,18 @@ def _( | |
| left, pi_left = rec(left) | ||
| right, pi_right = rec(right) | ||
|
|
||
| # Fallback to single partition on the smaller table | ||
| # Fallback to single partition on the smaller table whenever either | ||
| # side has more than one partition. | ||
| left_count = pi_left[left].count | ||
| right_count = pi_right[right].count | ||
| output_count = max(left_count, right_count) | ||
| fallback_msg = "ConditionalJoin not supported for multiple partitions." | ||
| if left_count < right_count: | ||
| if left_count > 1 or dynamic_planning: | ||
| if output_count > 1 or dynamic_planning: | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed a bug in the conditional-join fallback logic. We need to repartition the smaller table to a single broadcastable partition whenever either side has more than one partition. Previously the fallback only triggered when the smaller side itself had multiple partitions, which breaks asymmetric cases like @TomAugspurger and @rjzamora, does that sound correct?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll defer to Rick on this... Though if the new way is correct, then maybe the logic would be a bit more obvious if we combine the first two if conditions? if (left_count < right_count) and (output_count > 1 or dynamic_planning):
left = Repartition(left.schema, left)
pi_left[left] = PartitionInfo(count=1)
_fallback_inform(fallback_msg, config_options)
elif output_count > 1 or dynamic_planning:
right = Repartition(right.schema, right)
pi_right[right] = PartitionInfo(count=1)
_fallback_inform(fallback_msg, config_options)Then the repeated condition At least I hope those are logically equivalent.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess we are maybe disabling dynamic planning for some join tests? By default, dynamic-planning is always on, so we always repartition the smaller side.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe, it only happens when running with multiple GPUs
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, and we can even simplify a bit more. Done |
||
| left = Repartition(left.schema, left) | ||
| pi_left[left] = PartitionInfo(count=1) | ||
| _fallback_inform(fallback_msg, config_options) | ||
| elif right_count > 1 or dynamic_planning: | ||
| elif output_count > 1 or dynamic_planning: | ||
| right = Repartition(right.schema, right) | ||
| pi_right[right] = PartitionInfo(count=1) | ||
| _fallback_inform(fallback_msg, config_options) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are necessary to provide enough resources for UCX. For reference, we already do the same for UCXX and RapidsMPF, both need it for the same reason: