Description
The current IPC reader eagerly materializes dictionary deltas, which can lead to quadratic copying behavior when dictionaries are updated incrementally.
In reader.rs, delta dictionaries are currently handled as:
let combined = concat::concat(&[existing, &dict_values])?;
dictionaries_by_id.insert(dict_id, combined);
This is being invoked for every DictionaryBatch with isDelta = true, before any record batch consumes the dictionary.
Problem
For a sequence of delta updates, the reader repeatedly concatenates the full existing dictionary with each new delta. This results in cumulative copying proportional to:
O(1 + 2 + 3 + ... + N) = O(N²)
where N is the number of accumulated dictionary elements.
This behavior is not theoretical. The writer supports delta dictionaries (DictionaryHandling::Delta), and the reader processes them eagerly on every update. As a result, large or frequently updated dictionaries can incur significant and unnecessary memory and CPU overhead.
Root Cause
The reader performs eager materialization of the full dictionary on every delta update, even though only the final accumulated dictionary is required at record batch decode time.
Proposed Solution
Defer dictionary materialization until it is required for record batch decoding.
Specifically:
-
Maintain the existing HashMap<i64, ArrayRef> for materialized dictionaries.
-
Introduce a separate internal structure to store pending delta chunks:
HashMap<i64, Vec<ArrayRef>>
-
On delta dictionary updates:
- append the delta chunk to the pending list
- do not perform concatenation
-
Before decoding a RecordBatch:
- materialize the dictionary by concatenating the existing dictionary with all pending chunks
- update the materialized dictionary map
- clear the pending chunks
Expected Impact
This changes the accumulation cost from:
- repeated full dictionary concatenation per delta (quadratic behavior)
to:
- append-only delta accumulation with a single concatenation per decode (linear behavior)
This reduces:
- memory copying
- allocation overhead
- CPU time for dictionary-heavy workloads
cc: @alamb
Description
The current IPC reader eagerly materializes dictionary deltas, which can lead to quadratic copying behavior when dictionaries are updated incrementally.
In
reader.rs, delta dictionaries are currently handled as:This is being invoked for every
DictionaryBatchwithisDelta = true, before any record batch consumes the dictionary.Problem
For a sequence of delta updates, the reader repeatedly concatenates the full existing dictionary with each new delta. This results in cumulative copying proportional to:
where N is the number of accumulated dictionary elements.
This behavior is not theoretical. The writer supports delta dictionaries (
DictionaryHandling::Delta), and the reader processes them eagerly on every update. As a result, large or frequently updated dictionaries can incur significant and unnecessary memory and CPU overhead.Root Cause
The reader performs eager materialization of the full dictionary on every delta update, even though only the final accumulated dictionary is required at record batch decode time.
Proposed Solution
Defer dictionary materialization until it is required for record batch decoding.
Specifically:
Maintain the existing
HashMap<i64, ArrayRef>for materialized dictionaries.Introduce a separate internal structure to store pending delta chunks:
On delta dictionary updates:
Before decoding a
RecordBatch:Expected Impact
This changes the accumulation cost from:
to:
This reduces:
cc: @alamb