Skip to content

Deprecate MutableArrayData::extend and MutableArrayData::extend_nulls in favour of fallible try_extend / try_extend_nulls #9709

@HawaiianSpork

Description

@HawaiianSpork

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

MutableArrayData::extend and MutableArrayData::extend_nulls can panic at runtime when offset arithmetic overflows the underlying integer type (e.g. accumulating more than 2 GiB of data in a StringArray) or when the run-end counter overflows in a RunEndEncoded array. Because these methods return (), there is no way for callers to recover from or even detect the failure — the process simply aborts. This makes it impossible to build robust, panic-free pipelines on top of MutableArrayData.

Describe the solution you'd like

Add try_extend and try_extend_nulls methods to MutableArrayData that return Result<(), ArrowError> instead of panicking on overflow. On error, the mutable array should be left in a valid, consistent state (i.e. any partial writes to internal buffers should be rolled back) so the caller can safely inspect or discard the builder. The existing extend and extend_nulls methods should be deprecated in favour of the new ones.

All call sites within the workspace should be updated to use the new methods, propagating errors in the appropriate error type for each crate:

  • arrow-castArrowError::CastError
  • arrow-jsonArrowError::JsonError
  • arrow-selectArrowError (propagated directly)
  • parquetParquetError (converted via general_err!)

Describe alternatives you've considered

Leaving the panicking methods in place and wrapping each call site in std::panic::catch_unwind — this is impractical, not sound across all targets, and obscures the true failure mode.

Returning Option instead of Result — discards the error message, which is useful for diagnosing which limit was hit.

Related:
#7806

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions