[VL][Delta] Add native Delta bitmap aggregation support#12214
Conversation
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
Add the native bitmap primitive needed by later Delta deletion-vector DELETE/MoR work, without changing DELETE routing or enabling native bitmap construction in the command path yet. - extend RoaringBitmapArray for Delta Portable-format DV payloads, with bounded deserialization (CRoaring portable deserialize sizing before readSafe) - add the native bitmapaggregator aggregate for Delta row-index aggregation and register it - wire the aggregate name through Gluten expression / substrait planning (ExpressionNames.BITMAP_AGGREGATOR, AggregateFunctionsBuilder) - add native tests for bitmap serialization/deserialization and aggregate behavior, plus a delta_bitmap_benchmark for construction, partial-merge, and deserialize/probe cases Primitive-only: no DELETE command routing, no DML row-index scan planning changes, and the aggregate is not enabled as the default DELETE path.
2122ea6 to
1daeebf
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
| private def isPortableDeltaBitmapAggregator(aggregateFunc: AggregateFunction): Boolean = { | ||
| aggregateFunc.prettyName == ExpressionNames.BITMAP_AGGREGATOR && | ||
| Try { | ||
| aggregateFunc.getClass | ||
| .getMethod("serializationFormatString") | ||
| .invoke(aggregateFunc) | ||
| .asInstanceOf[String] | ||
| }.toOption.contains("Portable") | ||
| } |
There was a problem hiding this comment.
Would you explain a bit more about the method? Why reflection is needed, etc.
There was a problem hiding this comment.
Yeah, dug into this and the reflection wasn't actually buying us anything, so I dropped it it's just a prettyName match now (cb2f287).
The Portable check was me being paranoid. Delta only ever builds this aggregator in one spot (DMLWithDeletionVectorsHelper) and always with the Portable format, so it never actually filtered anything. And if a weird non-Portable payload ever did show up, the native deserialize already throws on the magic number so the real check is on the C++ side anyway. Bonus: gluten-substrait no longer needs to know anything about delta-spark.
|
|
||
| namespace gluten { | ||
|
|
||
| void registerDeltaBitmapAggregator( |
There was a problem hiding this comment.
Can we add a new source folder operators/functions/delta, for all Delta functions?
There was a problem hiding this comment.
Done, moved them into operators/functions/delta/ and fixed up the include + test + CMake (cb2f287). Future Delta functions can land there too.
|
Run Gluten Clickhouse CI on x86 |
…eflection
- move DeltaBitmapAggregator.{cc,h} into cpp/velox/operators/functions/delta/
so Delta native functions live under their own source folder; update the
include in RegistrationAllFunctions.cc, the test, and CMakeLists
- drop the reflective serialization-format check in the bitmap-aggregator match
and match on prettyName only. Delta's only BitmapAggregator construction site
(DMLWithDeletionVectorsHelper) always uses the Portable format, and a
non-Portable payload is rejected natively by RoaringBitmapArray::deserialize
(magic-number check), so the reflection was redundant
fc48276 to
cb2f287
Compare
|
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Pull request overview
Adds native Velox support for Delta’s deletion-vector bitmap construction by introducing a bounded portable-format roaring bitmap wrapper and a corresponding bitmapaggregator aggregate, plus native tests and a microbenchmark. This enables future work to move DELETE/MoR DV bitmap construction off the JVM and into native execution.
Changes:
- Add a
delta::RoaringBitmapArrayimplementation matching Delta JVM’s portable DV payload shape, with bounded deserialization and helpers for merge/cardinality/last/optimized serialization. - Add and register a native Velox aggregate
bitmapaggregator(bigint → row(bigint, bigint, varbinary)) and wire Spark/Substrait naming to it. - Add native tests for serialization/deserialization and aggregate behavior, plus
delta_bitmap_benchmark.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| shims/common/src/main/scala/org/apache/gluten/expression/ExpressionNames.scala | Adds the bitmapaggregator expression name constant. |
| gluten-substrait/src/main/scala/org/apache/gluten/expression/AggregateFunctionsBuilder.scala | Maps Delta’s BitmapAggregator (by prettyName) to the Substrait function name. |
| cpp/velox/operators/functions/RegistrationAllFunctions.cc | Registers the new Delta bitmap aggregate with the Velox function registry. |
| cpp/velox/operators/functions/delta/DeltaBitmapAggregator.h | Declares the registerDeltaBitmapAggregator entry point. |
| cpp/velox/operators/functions/delta/DeltaBitmapAggregator.cc | Implements and registers the bitmapaggregator Velox aggregate. |
| cpp/velox/compute/delta/RoaringBitmapArray.h | Redesigns the DV bitmap wrapper to match Delta’s JVM high-key → 32-bit roaring layout and adds new APIs. |
| cpp/velox/compute/delta/RoaringBitmapArray.cpp | Implements bounded portable-format (de)serialization, merge, cardinality, last, and optimized serialization. |
| cpp/velox/compute/delta/tests/RoaringBitmapArrayTest.cpp | Expands coverage for header compatibility, sparse-gap payloads, bounds enforcement, and trailing-byte checks. |
| cpp/velox/compute/delta/tests/DeltaBitmapAggregatorTest.cpp | Adds end-to-end aggregate tests (null/dup handling, partial+final merge, bounds checks). |
| cpp/velox/compute/delta/tests/CMakeLists.txt | Adds the new aggregate test to the delta test executable. |
| cpp/velox/CMakeLists.txt | Adds the new aggregate implementation file to the Velox build sources. |
| cpp/velox/benchmarks/DeltaBitmapBenchmark.cc | Adds a benchmark for build/serialize, deserialize/probe, and merging partial payloads. |
| cpp/velox/benchmarks/CMakeLists.txt | Registers the new delta_bitmap_benchmark target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| exec::registerAggregateFunction( | ||
| prefix + "bitmapaggregator", | ||
| std::move(signatures), | ||
| [](core::AggregationNode::Step step, | ||
| const std::vector<TypePtr>& argTypes, | ||
| const TypePtr& resultType, | ||
| const core::QueryConfig& /* config */) -> std::unique_ptr<exec::Aggregate> { | ||
| VELOX_CHECK_EQ(argTypes.size(), 1, "bitmapaggregator takes one argument"); | ||
| VELOX_CHECK_EQ(argTypes[0]->kind(), exec::isRawInput(step) ? TypeKind::BIGINT : TypeKind::VARBINARY); |
The native bitmapaggregator is registered with Velox but was missing from SubstraitToVeloxPlanValidator's supportedAggFuncs allow-list, so any plan containing it would validate-fail and fall back to Spark instead of running natively. Add it next to bitmap_construct_agg.
|
Run Gluten Clickhouse CI on x86 |
What changes are proposed in this pull request?
Deleting rows from a Delta table with deletion vectors means building (and merging) a roaring bitmap of deleted row indices. Today that happens on the JVM. This adds the native Velox building block for it, so a later PR can run DELETE's bitmap construction natively.
It's just the primitive — nothing is wired into DELETE yet:
RoaringBitmapArrayhandles Delta's Portable-format DV payloads, with bounded deserialization (CRoaring portable sizing beforereadSafe)bitmapaggregatoraggregate for row-index aggregation, registered and wired through Gluten's expression/substrait planningdelta_bitmap_benchmarkExplicitly out of scope (follow-up PRs): DELETE command routing, DML row-index scan planning, and enabling this aggregate as the default DELETE path.
How was this patch tested?
Rebased onto current
main(after the DV scan handoff merged in #12269) and re-checked the JVM wiring with JDK 17 (mvn compile -pl gluten-substrait -am,spotless:check,git diff --check). The native build/tests run in CI — that's the authoritative lane here, since local C++ validation is limited by the Velox build tree.Native evidence from the standalone run (C++ is unchanged by the rebase):
RoaringBitmapArrayTestpasses; round-trips a payload through both native and a Delta 3.3.2 JVM helper (values1,7,1 << 33; cardinality3)delta_bitmap_benchmark: 1M build+serialize ~8–10 ms (~105–132M rows/s), merge of 64 partials ~1.1 ms, deserialize+probe ~487 µsWas this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB