Is your feature request related to a problem? Please describe.
The cudf join family currently mixes two terminologies:
- Free functions (
cudf::inner_join, cudf::left_join, cudf::full_join, cudf::cross_join, cudf::mixed_*_join) take left / right tables and return {left_indices, right_indices}.
- Object-oriented join classes take
build / probe tables. Each class has its own convention based on which side is hashed:
hash_join, distinct_hash_join, filtered_join, key_remapping: build = right, probe = left.
mark_join: build = left, probe = right (the algorithm builds a hash table on the smaller left table and probes the right table).
The probe/build roles are not deterministic and can vary depending on the algorithm, whereas left/right provides a consistent and unambiguous reference, minimizing confusion. The goal is to use left / right consistently in the user-facing API while keeping build / probe for verb-form actions in the implementation (e.g. "build the hash table", "subsequent probe calls", validate_hash_join_probe, build_hash_join).
Describe the solution you'd like
Rename build/probe to left/right across all join APIs:
Describe alternatives you've considered
Always requiring build/probe forces users to understand internal algorithm details, which is not ideal.
Additional context
- ABI is stable in each rename PR (parameter names only; PIMPL layouts unchanged).
- JNI callers are positional, so the C++ rename does not break Java consumers.
- pylibcudf only binds
filtered_join; that PR will need matching Python parameter renames.
Is your feature request related to a problem? Please describe.
The cudf join family currently mixes two terminologies:
cudf::inner_join,cudf::left_join,cudf::full_join,cudf::cross_join,cudf::mixed_*_join) takeleft/righttables and return{left_indices, right_indices}.build/probetables. Each class has its own convention based on which side is hashed:hash_join,distinct_hash_join,filtered_join,key_remapping: build = right, probe = left.mark_join: build = left, probe = right (the algorithm builds a hash table on the smaller left table and probes the right table).The
probe/buildroles are not deterministic and can vary depending on the algorithm, whereasleft/rightprovides a consistent and unambiguous reference, minimizing confusion. The goal is to useleft/rightconsistently in the user-facing API while keepingbuild/probefor verb-form actions in the implementation (e.g. "build the hash table", "subsequent probe calls",validate_hash_join_probe,build_hash_join).Describe the solution you'd like
Rename
build/probetoleft/rightacross all join APIs:hash_joinanddistinct_hash_join— Rename build/probe to right/left in hash_join and distinct_hash_join #22382 (build → right, probe → left)filtered_join(build → right, probe → left; also wrapped by pylibcudf — Python binding params change)mark_join(build → left, probe → right —mark_joinis designed to build the hash table on the left table)key_remapping(build → right, probe → left)mixed_joinandmixed_join_semi(internalbuild_table/probe_tablelocals only; public API already uses left/right)distinct_filtered_join(detail-only; build → right, probe → left)prefilter_tenum doc incudf/join/join.hpp("probe-side prefilter")Describe alternatives you've considered
Always requiring
build/probeforces users to understand internal algorithm details, which is not ideal.Additional context
filtered_join; that PR will need matching Python parameter renames.