BUG: Preserve nullable index dtype in _transform_index#65316
BUG: Preserve nullable index dtype in _transform_index#65316gautamvarmadatla wants to merge 3 commits intopandas-dev:mainfrom
_transform_index#65316Conversation
35f1026 to
03b066d
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR!
| arr = self[:0].array | ||
| new_values = arr._cast_pointwise_result(items) |
There was a problem hiding this comment.
| arr = self[:0].array | |
| new_values = arr._cast_pointwise_result(items) | |
| new_values = self.array._cast_pointwise_result(items) |
The slicing should not be needed I think?
There was a problem hiding this comment.
yup, you're right! removed it
| if isinstance(items[0], tuple): | ||
| return Index(items, name=self.name, tupleize_cols=False) |
There was a problem hiding this comment.
What is the reason for this check? (I was first thinking it might be to ensure we create a MultiIndex when having tuples (so not specify the dtype as in the path below), but since tupleize_cols=False, that should already not happen?
There was a problem hiding this comment.
I would expect that _cast_pointwise_result, when getting something (like tuples) that it cannot convert, it should give you back the original input as an object-dtype ndarray, and so then the below code path should still work as well?
(and if that is not the case, I think that we should fix _cast_pointwise_result to ensure that, so the usage here can be simpler)
There was a problem hiding this comment.
With #65318 merged, it might now work to just call _cast_pointwise_result without the check for tuples
There was a problem hiding this comment.
Yes! The check was guarding against a ValueError in masked._cast_pointwise_result where tuple inputs caused np.asarray to return a 2D array. With #65318 pushed it's just dead code now, so removed it.
03b066d to
a633a57
Compare
| if not items: | ||
| return Index( | ||
| items, dtype=self.dtype, name=self.name, tupleize_cols=False | ||
| ) |
There was a problem hiding this comment.
Is this one still needed? I would also expect that _cast_pointwise_result should essentially do that as well
There was a problem hiding this comment.
But I see that in practice this is indeed certainly not the case right now, so ignore my comment
(we might want to tighten that aspect in the interface, and then this can be simplified later)
There was a problem hiding this comment.
cc @jbrockmendel what do you think about expecting that _cast_pointwise_result defaults to the caller's dtype in case of empty input?
(generally, at that level, the code cannot know what the "correct" dtype would be. But at the moment our different EAs have varying behaviour, and it might be good to at least be consistent, and then using the calling dtype seems to be an obvious choice)
There was a problem hiding this comment.
Eventually I'd like to get to Void Dtype, but im fine with same-dtype for the interim.
|
The CI failures look unrelated to this PR. All failures are from np.datetime64("NaT") (without a unit specifier) triggering a NumPy deprecation warning on the numpy-nightly build |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.Fixed
Index._transform_index()to use_cast_pointwise_resultwhen rebuilding the index. This preserves nullable extension dtypes instead of losing them through plain list inference.