Skip to content

[FEA] Rename build/probe to right/left across join APIs #22400

@PointKernel

Description

@PointKernel

Is your feature request related to a problem? Please describe.
The cudf join family currently mixes two terminologies:

  • Free functions (cudf::inner_join, cudf::left_join, cudf::full_join, cudf::cross_join, cudf::mixed_*_join) take left / right tables and return {left_indices, right_indices}.
  • Object-oriented join classes take build / probe tables. Each class has its own convention based on which side is hashed:
    • hash_join, distinct_hash_join, filtered_join, key_remapping: build = right, probe = left.
    • mark_join: build = left, probe = right (the algorithm builds a hash table on the smaller left table and probes the right table).

The probe/build roles are not deterministic and can vary depending on the algorithm, whereas left/right provides a consistent and unambiguous reference, minimizing confusion. The goal is to use left / right consistently in the user-facing API while keeping build / probe for verb-form actions in the implementation (e.g. "build the hash table", "subsequent probe calls", validate_hash_join_probe, build_hash_join).

Describe the solution you'd like
Rename build/probe to left/right across all join APIs:

  • hash_join and distinct_hash_joinRename build/probe to right/left in hash_join and distinct_hash_join #22382 (build → right, probe → left)
  • filtered_join (build → right, probe → left; also wrapped by pylibcudf — Python binding params change)
  • mark_join (build → left, probe → right — mark_join is designed to build the hash table on the left table)
  • key_remapping (build → right, probe → left)
  • mixed_join and mixed_join_semi (internal build_table / probe_table locals only; public API already uses left/right)
  • distinct_filtered_join (detail-only; build → right, probe → left)
  • Public prefilter_t enum doc in cudf/join/join.hpp ("probe-side prefilter")

Describe alternatives you've considered
Always requiring build/probe forces users to understand internal algorithm details, which is not ideal.

Additional context

  • ABI is stable in each rename PR (parameter names only; PIMPL layouts unchanged).
  • JNI callers are positional, so the C++ rename does not break Java consumers.
  • pylibcudf only binds filtered_join; that PR will need matching Python parameter renames.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions