DEPR: is_copy arg of take by ryankarlos · Pull Request #30615 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation22 Commits13 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

ryankarlos

@ryankarlos ryankarlos changed the titledeprecate is_copy argument of take and remove is_copy=False usage DEPR: dis_copy argument of take

Jan 2, 2020

@ryankarlos ryankarlos changed the titleDEPR: dis_copy argument of take DEPR: is_copy arg of take

Jan 2, 2020

@ryankarlos

Not sure why my changes are making these other tests break ?
Failed: DID NOT RAISE <class 'pandas.core.common.SettingWithCopyError'>

charlesdong1991

@jreback

looks good, needs a rebase; ping when green.

@pep8speaks

Hello @ryankarlos! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-06 02:08:13 UTC

@ryankarlos

@jreback do you know why the depr warning is making the test fail specifically for Linuxpy37 ?

pandas/tests/frame/methods/test_asof.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/generic.py:7009: in asof
    data = self.take(locs, is_copy=False)

and if i set this to data = self.take(locs) , it raises a SettingWithCopyError for all the builds due to this write on copy data.loc[missing] = np.nan

        if value == "raise":
>           raise com.SettingWithCopyError(t)
E           pandas.core.common.SettingWithCopyError: 
E           A value is trying to be set on a copy of a slice from a DataFrame.
E           Try using .loc[row_indexer,col_indexer] = value instead
E           
E           See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

pandas/core/generic.py:3684: SettingWithCopyError

@TomAugspurger

The numpydev py37 build is the only one where we elevate warnings to errors which fail the build.

Most likely you'll need to look at that test and what is_copy is doing. If the test looks correct, you made need to replicate some of the is_copy=False behavior in the places that previously passed is_copy=False.

@ryankarlos

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated routines, df.asof should not be showing a warning

with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)
result = df.asof(dates)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this showing a warning? you need to fix internally so this is not the case

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the self.take(locs, is_copy=False) in def asof will now generate a warning after the deprecation change.
I could change it to default None (which sets is_copy=True) - but then that generates a SettingWithCopyError due to data.loc[missing] = np.nan in the block below

missing = locs == -1
data = self.take(locs)
data.index = where
data.loc[missing] = np.nan

I suppose I could temporarily suppress it

with pd.option_context('mode.chained_assignment', None):
            data.loc[missing] = np.nan

or create a deep copy after the take operation but not sure if thats the best approach ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you simply step theu the code and out a copy where needed

expected = df.asof(dates)
tm.assert_frame_equal(result, expected)
with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@ryankarlos

@ryankarlos

jreback

@jreback

@jorisvandenbossche

jorisvandenbossche

@@ -7011,7 +7023,8 @@ def asof(self, where, subset=None):
# mask the missing
missing = locs == -1
data = self.take(locs, is_copy=False)
d = self.take(locs)
data = d.copy()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there test failures if you didn't do this d.copy() step?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche

Yes, it was breaking builds with SettingwithCopyError #30615 (comment)

I think this points to a problem in the current deprecation.

IMO, the goal of removing is_copy is to make this keyword unnecessary, in the idea that take should always return a copy. So that means you should never need to manually copy after doing a take.

WIth the current deprecation however, the user seems to loose this ability of not needing to take a copy. So I think, when deprecating/removing the is_copy arg, it should rather be the behaviour of is_copy=False that we should keep as the default.

@jreback

the purpose of is_copy was to force a copy
take never had the always copy behavior of numpy - but it was mostly an internal method anyhow (eg you can use iloc)

so prob ok just to leave as is (maybe add a doc string)

@jorisvandenbossche

the purpose of is_copy was to force a copy

I don't think this is correct, as is_copy=True/False did not influence whether a copy was made or not. What is_copy=True did, is setting self._is_copy to indicate to pandas it is referencing another object. So actually is_copy=True made it "less" a copy than is_copy=False.

With is_copy=False, you could indicate to pandas "this is really a copy already, don't give me annoying SettingWithCopyWarnings" without the need to do another explicit copy.

(the name of is_copy and the explanation in the docstring is just very confusing / wrong)

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request

Jan 15, 2020

@jorisvandenbossche