Skip to content

BUG: fix ArrowExtensionArray constructor raising on np.nan in string sequences#65307

Open
tinezivic wants to merge 3 commits intopandas-dev:mainfrom
tinezivic:fix-arrow-array-nan-construction
Open

BUG: fix ArrowExtensionArray constructor raising on np.nan in string sequences#65307
tinezivic wants to merge 3 commits intopandas-dev:mainfrom
tinezivic:fix-arrow-array-nan-construction

Conversation

@tinezivic
Copy link
Copy Markdown
Contributor

Problem description

When constructing an ArrowDtype string array via pd.array() with a sequence that contains np.nan, an ArrowTypeError is raised:

>>> import pandas as pd
>>> import numpy as np
>>> pd.array(["a", np.nan], dtype=pd.ArrowDtype(pa.string()))
# ArrowTypeError: Expected bytes, got a 'float' object

Root cause

In ArrowExtensionArray._box_pa_array(), the mask is computed via:

arr_value = np.asarray(value)
mask = isna(arr_value)

np.asarray(["a", np.nan]) produces a <U3 Unicode array in which np.nan is silently coerced to the string 'nan'. isna() then returns [False, False], the NA mask is empty, and pyarrow rejects the float value.

Fix

Pass dtype=object to np.asarray() so that scalar types are preserved and isna() correctly identifies np.nan as a missing value.

Closes #64578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: constructing string ArrowExtensionArray with NaNs fails

1 participant