BUG: Fixed PandasArray.setitem with str by TomAugspurger · Pull Request #28119 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation28 Commits6 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
LGTM
That said, I'm just noticing that below the changed code PandasArray can change its dtype and pin a new underlying ndarray. Does that seem sketchy to anyone else?
PandasArray can change its dtype and pin a new underlying ndarray.
I'm not too bothered by the new underlying ndarray part, since it's private. What part bothers you?
I do notice that .astype
should probably be with copy=False
.
I'm not too bothered by the new underlying ndarray part, since it's private. What part bothers you?
It's liable to cause surprises with view-like semantics. e.g. I'd expect parr[:]
or np.asarray(parr)
to stay in sync with parr
. (this discussion probably belongs in a separate issue)
Agreed that this can go in a followup :)
if is_object_dtype(self.dtype._dtype): |
---|
t = np.dtype(object) |
else: |
t = self.dtype._dtype |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test that hits this branch?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_setitem_object_typecode[None]
hits it (setting a string into an integer array).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simpler to leave the original code then just convert a np.str to no.object (which is what we do inside blocks manager and other places); maybe have a function to do this rather than rewriting logic all over the place
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's appropriate for PandasArray. The idea is to take an arbitrary numpy array and box it in an extension array.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that’s exactly what is done in ObjectBlock now
pls refactor rather than adding logic
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it. I think I was mimicking the behavior of Series.__setitem__
In [4]: x = np.array([1, 2, 3])
In [5]: s = pd.Series(x)
In [6]: s.values is x Out[6]: True
In [7]: s[0] = 'a'
In [8]: s.values is x Out[8]: False
But I'm happy to be stricter here.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, we'll also inherit things like
In [11]: x = np.array([1, 2, 3])
In [12]: x[0] = 5.5
In [13]: x Out[13]: array([5, 2, 3])
But maybe that's OK, if the intent is to be close to NumPy here.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what extent can we punt on the float treatment for now? I think there's a case to be made that we should raise instead of casting there, but don't want to bog this down any more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think our options are to always raise when the dtypes don't match, or adopt NumPy's behavior. I don't think I have a preference.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought that pushes me towards raising is that if/when this is backing a Block, we want Block.setitem
to try to set it on block.values and then fall back to casting.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. minor whatsnew comments, merge on green.
Fixed the whatsnew. Merging.
proost pushed a commit to proost/pandas that referenced this pull request
proost pushed a commit to proost/pandas that referenced this pull request