Is your feature request related to a problem or challenge? Please describe what you are trying to do.
MutableArrayData::extend and MutableArrayData::extend_nulls can panic at runtime when offset arithmetic overflows the underlying integer type (e.g. accumulating more than 2 GiB of data in a StringArray) or when the run-end counter overflows in a RunEndEncoded array. Because these methods return (), there is no way for callers to recover from or even detect the failure — the process simply aborts. This makes it impossible to build robust, panic-free pipelines on top of MutableArrayData.
Describe the solution you'd like
Add try_extend and try_extend_nulls methods to MutableArrayData that return Result<(), ArrowError> instead of panicking on overflow. On error, the mutable array should be left in a valid, consistent state (i.e. any partial writes to internal buffers should be rolled back) so the caller can safely inspect or discard the builder. The existing extend and extend_nulls methods should be deprecated in favour of the new ones.
All call sites within the workspace should be updated to use the new methods, propagating errors in the appropriate error type for each crate:
arrow-cast — ArrowError::CastError
arrow-json — ArrowError::JsonError
arrow-select — ArrowError (propagated directly)
parquet — ParquetError (converted via general_err!)
Describe alternatives you've considered
Leaving the panicking methods in place and wrapping each call site in std::panic::catch_unwind — this is impractical, not sound across all targets, and obscures the true failure mode.
Returning Option instead of Result — discards the error message, which is useful for diagnosing which limit was hit.
Related:
#7806
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
MutableArrayData::extendandMutableArrayData::extend_nullscan panic at runtime when offset arithmetic overflows the underlying integer type (e.g. accumulating more than 2 GiB of data in aStringArray) or when the run-end counter overflows in aRunEndEncodedarray. Because these methods return(), there is no way for callers to recover from or even detect the failure — the process simply aborts. This makes it impossible to build robust, panic-free pipelines on top ofMutableArrayData.Describe the solution you'd like
Add
try_extendandtry_extend_nullsmethods toMutableArrayDatathat returnResult<(), ArrowError>instead of panicking on overflow. On error, the mutable array should be left in a valid, consistent state (i.e. any partial writes to internal buffers should be rolled back) so the caller can safely inspect or discard the builder. The existingextendandextend_nullsmethods should be deprecated in favour of the new ones.All call sites within the workspace should be updated to use the new methods, propagating errors in the appropriate error type for each crate:
arrow-cast—ArrowError::CastErrorarrow-json—ArrowError::JsonErrorarrow-select—ArrowError(propagated directly)parquet—ParquetError(converted viageneral_err!)Describe alternatives you've considered
Leaving the panicking methods in place and wrapping each call site in
std::panic::catch_unwind— this is impractical, not sound across all targets, and obscures the true failure mode.Returning
Optioninstead ofResult— discards the error message, which is useful for diagnosing which limit was hit.Related:
#7806