Add copy and deepcopy to NDFrame by bmcfee · Pull Request #15444 · pandas-dev/pandas (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes deepcopy failure on empty dataframes with non-empty column set (numpy 1.12 compatibility) #15370
- tests added / passed
- passes
git diff upstream/master | flake8 --diff - whatsnew entry
| data = self._data.copy(deep=deep) |
|---|
| return self._constructor(data).__finalize__(self) |
| __copy__ = copy |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but a child class that overrides copy won't override __copy__ without a def __copy__... here
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand -- def copy is defined immediately above, and the PR adds the __copy__ = copy alias. Are you saying that this won't work unless I also do def __copy__?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is better as def __copy__(self):.
also let's change the same in pandas/indexes/base.py. hopefully nothing will break.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmcfee but if a class (NDFrameChild) inherits from NDFrame and overrides .copy: with the proposed code NDFrameChild.__copy__ remains NDFrame.copy, rather than NDFFrameChild.copy
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I see, and agree with @jreback that it should probably be fixed in indexes/base as well.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your prior way seems more elegant (and faster)! Unfortunately Python fails the elegant = correct question here, though
| data = self._data.copy(deep=deep) |
|---|
| return self._constructor(data).__finalize__(self) |
| __copy__ = copy |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is better as def __copy__(self):.
also let's change the same in pandas/indexes/base.py. hopefully nothing will break.
| for idx, value in compat.iteritems(series): |
|---|
| self.assertNotEqual(self.frame['A'][idx], value) |
| def test_deepcopy_empty(self): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tests instead in test_generic.py
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is better as def copy(self):.
also let's change the same in pandas/indexes/base.py. hopefully nothing will break.
Do you want both of those in this PR?
add tests instead in test_generic.py
Sure, I can move it. I put it here because I figured it ought to be near the other deepcopy test and didn't see any other obvious place for it.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move those as well
pandas objects compatability with Numpy or Python functions
Concat, Merge/Join, Stack/Unstack, Explode
labels
| __copy__ = copy |
| def __copy__(self, **kwargs): |
| return self.copy(**kwargs) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add (and test) __deepcopy__ here as well
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__deepcopy__ already exists in this class, see line 1483. Did you want me to move it?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, then just move it next to this one
| expected, |
|---|
| check_index_type=False) |
| def test_deepcopy_empty(self): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at how the other tests are done. We need systematic testing on all structures. So move to Generic and use _construct and _compare
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment below on how test should be for this. you need to tests .copy(), copy(...), .deepcopy(), deepcopy()
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't understand your comment here. Can you point me to a specific example to base the tests on?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think I might not have made the motivation for this test sufficiently clear.
It's not intended to be a full test of deepcopy, for which _construct might be useful. Rather, it's designed specifically to trigger the bug I mentioned in #15370 where the data/index are empty but not the columns. I don't see how _construct would be appropriate for this, but maybe I'm doing it wrong?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmcfee well you are changing fundamental things, which need full testing.
_construct is run with Series, DataFrame,Panel, to simply make a new one. Then you call a method/function and test if its a copy or not.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback sure, but that would be a separate test from the one you said to move. That issue never affected series; as far as I can tell, there's no way that it could have since it depends on having columns.
I have no problem adding generic tests for copy/deepcopy, but I want to make sure it's the right test.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmcfee you can add the test you wanted. but you are making fundamental changes. These HAVE to be tested on all classes that are affected. (and you ARE affecting everything)
| empty_frame = DataFrame(data=[], index=[], columns=['A']) |
|---|
| empty_frame_copy = deepcopy(empty_frame) |
| self.assertEqual(empty_frame, empty_frame_copy) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are already generic tests in pandas/indexes/common.py but need to expand them to include using
from copy import copy, deepcopy
....
copy(...)
deepcopy(..)
these should match self.copy() and self.deepcopy()
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, when you say self.deepcopy do you mean self.__deepcopy__? There is not currently an exposed function of that name.
Unrelated to this PR, I've noticed several other places where there are assignment method aliases in base classes, eg, Index.__bool__. I'm not interested in fixing that in this PR, but maybe someone else ought to take a look down the road.
@bmcfee if you want to make an issue would be great for other things
I made the requested changes. At this point, it looks like there are a bunch of test failures that have nothing to do with this PR. Should I assume master is in a not-very-good state right now?
not sure what you mean?
master has been all green for a while
@jreback appveyor had one build fail to install (?!); travis has one build failing on a panel4d setitem test, but only one of the builds.
FWIW, I had an appveyor / conda install fail (checksum mismatch) on a separate project this morning, and restarting the build fixed it. Maybe there's just something screwy in conda land today.
no your change causes this to fail
no your change causes this to fail
Really? Nothing in the log indicates as such:
installing requirements from ci\requirements-2.7-64.run - done"
conda install -n pandas (conda build ci\appveyor.recipe -q --output)
conda : Error: no such directory: C:\projects\pandas-465\ci\appveyor.recipe
At line:1 char:26
+ conda install -n pandas (conda build ci\appveyor.recipe -q --output)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (Error: no such ...appveyor.recipe:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
conda :
At line:1 char:1
+ conda install -n pandas (conda build ci\appveyor.recipe -q --output)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
CondaValueError: Value error: too few arguments, must supply command line package specs or --file
Command executed with exception:
CondaValueError: Value error: too few arguments, must supply command line package specs or --file
appveyor is just for confirmation builds
they are behind and i has to cancel a few
look on travis
Okay, I'm looking, and I don't understand anything about what's going on here, or what's different between the run where it passes (eg here) and the one where it fails (eg, here). Since the test log doesn't provide much useful diagnostic information, maybe you can help explain what the difference is so I'm not just stumbling in the dark?
@bmcfee I rebased and added a commit. hopefully this should pass.
the issue is that test_generic.py runs tests on Panel4D (which we are deprecating), and it is testing on that, so the addtl _consturct shows a warning, and later tests thus fail. easy enough just to skip that particular test.
This guy's green now -- what's next?
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request
closes pandas-dev#15370
Author: Brian McFee brian.mcfee@nyu.edu Author: Jeff Reback jeff@reback.net
Closes pandas-dev#15444 from bmcfee/deepcopy-ndframe and squashes the following commits:
bf36f35 [Jeff Reback] TST: skip the panel4d deepcopy tests d58b1f6 [Brian McFee] added tests for copy and deepcopy 35f3e0f [Brian McFee] relocated Index.deepcopy to live near copy 1aea940 [Brian McFee] switched deepcopy test to using generic comparator 7e67e7d [Brian McFee] ndframe and index copy are now proper methods 820664c [Brian McFee] moved deepcopy test to generic.py 9721041 [Brian McFee] added copy/deepcopy to ndframe, fixes pandas-dev#15370
Labels
pandas objects compatability with Numpy or Python functions
Concat, Merge/Join, Stack/Unstack, Explode