Avoid quadratic Dictionary Delta accumulation in IPC reader by deferring materialization

## Description

The current IPC reader eagerly materializes dictionary deltas, which can lead to quadratic copying behavior when dictionaries are updated incrementally.

In `reader.rs`, delta dictionaries are currently handled as:

```rust
let combined = concat::concat(&[existing, &dict_values])?;
dictionaries_by_id.insert(dict_id, combined);
```

This is being invoked for every `DictionaryBatch` with `isDelta = true`, before any record batch consumes the dictionary.

---

## Problem

For a sequence of delta updates, the reader repeatedly concatenates the full existing dictionary with each new delta. This results in cumulative copying proportional to:

```
O(1 + 2 + 3 + ... + N) = O(N²)
```

where N is the number of accumulated dictionary elements.

This behavior is not theoretical. The writer supports delta dictionaries (`DictionaryHandling::Delta`), and the reader processes them eagerly on every update. As a result, large or frequently updated dictionaries can incur significant and unnecessary memory and CPU overhead.

---

## Root Cause

The reader performs eager materialization of the full dictionary on every delta update, even though only the final accumulated dictionary is required at record batch decode time.

---

## Proposed Solution

Defer dictionary materialization until it is required for record batch decoding.

Specifically:

* Maintain the existing `HashMap<i64, ArrayRef>` for materialized dictionaries.
* Introduce a separate internal structure to store pending delta chunks:

  ```
  HashMap<i64, Vec<ArrayRef>>
  ```
* On delta dictionary updates:

  * append the delta chunk to the pending list
  * do not perform concatenation
* Before decoding a `RecordBatch`:

  * materialize the dictionary by concatenating the existing dictionary with all pending chunks
  * update the materialized dictionary map
  * clear the pending chunks

---

## Expected Impact

This changes the accumulation cost from:

* repeated full dictionary concatenation per delta (quadratic behavior)

to:

* append-only delta accumulation with a single concatenation per decode (linear behavior)

This reduces:

* memory copying
* allocation overhead
* CPU time for dictionary-heavy workloads

---
cc: @alamb 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid quadratic Dictionary Delta accumulation in IPC reader by deferring materialization #9775

Description

Problem

Root Cause

Proposed Solution

Expected Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Avoid quadratic Dictionary Delta accumulation in IPC reader by deferring materialization #9775

Description

Description

Problem

Root Cause

Proposed Solution

Expected Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions