Skip to content

BUG: constructing string ArrowExtensionArray with NaNs fails #64578

@jorisvandenbossche

Description

@jorisvandenbossche

In the default mode, NaNs in input sequences should be treated as missing values, I think, also for ArrowDtype. But
with current pandas main:

>>> pd.array(["a", np.nan], dtype=pd.ArrowDtype(pa.string()))
...
File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:685, in ArrowExtensionArray._box_pa_array(cls, value, pa_type, copy)
    682     pa_array = pa.array(value, type=pa_type, mask=mask)
    683 except (pa.ArrowInvalid, pa.ArrowTypeError):
    684     # GH50430: let pyarrow infer type, then cast
--> 685     pa_array = pa.array(value, mask=mask)
    687 if pa_type is None and pa.types.is_duration(pa_array.type):
    688     # Workaround https://github.com/apache/arrow/issues/37291
    689     from pandas.core.tools.timedeltas import to_timedelta
...
ArrowTypeError: Expected bytes, got a 'float' object

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityBug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions