Skip to content

parquet: avoid decode and heap allocation on terminal skip in DeltaBitPackDecoder #9784

@sahuagin

Description

@sahuagin

When a caller skips all remaining values on a page (to_skip >= values_left),
last_value will not be read again after the skip completes. The current
skip() implementation always decodes values through get_batch to maintain
last_value accuracy, which requires a heap-allocated scratch buffer and
unnecessary decode work.

Proposed fix: Detect the terminal case upfront. When terminal, use
BitReader::skip(n, bit_width) to advance the bit reader without decoding
individual values, and return early without updating last_value. This avoids
the scratch-buffer allocation entirely for the common "skip rest of page" case.

Measured improvement (arrow_reader bench, vs upstream HEAD):

  • mixed stepped skip: -3.9%

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions