DEPR: is_copy arg of take by ryankarlos · Pull Request #30615 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation22 Commits13 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes DEPR: .take(..., is_copy=True); rename is_copy -> copy #27357
- tests added / passed
- passes
black pandas
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
ryankarlos changed the title
deprecate is_copy argument of take and remove is_copy=False usage DEPR: dis_copy argument of take
ryankarlos changed the title
DEPR: dis_copy argument of take DEPR: is_copy arg of take
Not sure why my changes are making these other tests break ? Failed: DID NOT RAISE <class 'pandas.core.common.SettingWithCopyError'>
looks good, needs a rebase; ping when green.
Hello @ryankarlos! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻
Comment last updated at 2020-01-06 02:08:13 UTC
@jreback do you know why the depr warning is making the test fail specifically for Linuxpy37 ?
pandas/tests/frame/methods/test_asof.py:27:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/generic.py:7009: in asof
data = self.take(locs, is_copy=False)
and if i set this to data = self.take(locs)
, it raises a SettingWithCopyError for all the builds due to this write on copy data.loc[missing] = np.nan
if value == "raise":
> raise com.SettingWithCopyError(t)
E pandas.core.common.SettingWithCopyError:
E A value is trying to be set on a copy of a slice from a DataFrame.
E Try using .loc[row_indexer,col_indexer] = value instead
E
E See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
pandas/core/generic.py:3684: SettingWithCopyError
The numpydev py37 build is the only one where we elevate warnings to errors which fail the build.
Most likely you'll need to look at that test and what is_copy
is doing. If the test looks correct, you made need to replicate some of the is_copy=False
behavior in the places that previously passed is_copy=False
.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated routines, df.asof should not be showing a warning
with warnings.catch_warnings(): |
---|
warnings.simplefilter("ignore", FutureWarning) |
result = df.asof(dates) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this showing a warning? you need to fix internally so this is not the case
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the self.take(locs, is_copy=False)
in def asof
will now generate a warning after the deprecation change.
I could change it to default None (which sets is_copy=True
) - but then that generates a SettingWithCopyError due to data.loc[missing] = np.nan
in the block below
missing = locs == -1
data = self.take(locs)
data.index = where
data.loc[missing] = np.nan
I suppose I could temporarily suppress it
with pd.option_context('mode.chained_assignment', None):
data.loc[missing] = np.nan
or create a deep copy after the take
operation but not sure if thats the best approach ?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you simply step theu the code and out a copy where needed
expected = df.asof(dates) |
---|
tm.assert_frame_equal(result, expected) |
with warnings.catch_warnings(): |
warnings.simplefilter("ignore", FutureWarning) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -7011,7 +7023,8 @@ def asof(self, where, subset=None): |
---|
# mask the missing |
missing = locs == -1 |
data = self.take(locs, is_copy=False) |
d = self.take(locs) |
data = d.copy() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were there test failures if you didn't do this d.copy()
step?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was breaking builds with SettingwithCopyError #30615 (comment)
I think this points to a problem in the current deprecation.
IMO, the goal of removing is_copy
is to make this keyword unnecessary, in the idea that take
should always return a copy. So that means you should never need to manually copy
after doing a take
.
WIth the current deprecation however, the user seems to loose this ability of not needing to take a copy. So I think, when deprecating/removing the is_copy
arg, it should rather be the behaviour of is_copy=False
that we should keep as the default.
the purpose of is_copy was to force a copy
take never had the always copy behavior of numpy - but it was mostly an internal method anyhow (eg you can use iloc)
so prob ok just to leave as is (maybe add a doc string)
the purpose of is_copy was to force a copy
I don't think this is correct, as is_copy=True/False
did not influence whether a copy was made or not. What is_copy=True
did, is setting self._is_copy
to indicate to pandas it is referencing another object. So actually is_copy=True
made it "less" a copy than is_copy=False
.
With is_copy=False
, you could indicate to pandas "this is really a copy already, don't give me annoying SettingWithCopyWarnings" without the need to do another explicit copy.
(the name of is_copy
and the explanation in the docstring is just very confusing / wrong)
jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request