API: Make most arguments for read_html and read_json keyword-ony by alexitkes · Pull Request #27573 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@alexitkes

As mentioned in #27544 it is better to use keyword-only arguments for functions with too many arguments. A deprecation warning will be displayed if read_html get more than 2 positional arguments (io and match) or read_json get more than 1 positional argument (path_or_buf).

@alexitkes alexitkes changed the titleWIP: Make most arguments for read_html and read_json keyword-ony Make most arguments for read_html and read_json keyword-ony

Jul 24, 2019

@TomAugspurger

We need to deprecate the old behavior first. As written, this is an api breaking change.

And we shouldn’t do this until 1.0 or later.

@jreback

We need to deprecate the old behavior first. As written, this is an api breaking change.

And we shouldn’t do this until 1.0 or later.

and how exactly would you deprecate ?

@TomAugspurger

Should be able to use a decorator to inspect the calm, right? And if any positional or keyword args are passed in args then we warn?

On Jul 25, 2019, at 07:44, Jeff Reback ***@***.***> wrote: We need to deprecate the old behavior first. As written, this is an api breaking change. And we shouldn’t do this until 1.0 or later. and how exactly would you deprecate ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

@alexitkes

Thank you. I also wanted to propose somewhat like this.

def Deprecated(*, n_args: int):
    def _deprecated(f):
        def _f(*args, **kwargs):
            if len(args) > n_args:
                warnings.warn("%s will only accept %i positional arguments in version X.Y.Z" % (f.__name__,n_args))
            return f(*args, **kwargs)
        return _f
    return _deprecated

Or do such a decorator already exist?

@pep8speaks

Hello @alexitkes! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-31 18:09:11 UTC

@alexitkes alexitkes changed the titleMake most arguments for read_html and read_json keyword-ony [WIPMake most arguments for read_html and read_json keyword-ony

Jul 26, 2019

@alexitkes alexitkes changed the title[WIPMake most arguments for read_html and read_json keyword-ony [WIP] Make most arguments for read_html and read_json keyword-ony

Jul 26, 2019

@alexitkes alexitkes changed the title[WIP] Make most arguments for read_html and read_json keyword-ony Make most arguments for read_html and read_json keyword-ony

Jul 28, 2019

TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests to read_html / read_json to ensure that warnings are raised there?

@TomAugspurger

Lifting #27573 (comment) to the top level

I'd like not to repeat the arguments here. That seems fragile. Is it possible to have the decorator inspect the signature and warn on any positional_or_keyword arg passed positionally?

What do you think about this @alexitkes?

@alexitkes

@TomAugspurger Thank you for review. Yes, writing argument list twice will be quite annoying, but I will need some time to find a way to avoid it, and I would like to try. However, I think it would be better to preserve the allowed_args argument for the decorator and default it to list of required arguments if None given.

@TomAugspurger

However, I think it would be better to preserve the allowed_args argument for the decorator and default it to list of required arguments if None given.

Why do you think we'll need that? That sounds more complicated than just detecting kwargs passed positionally. If we aren't going to need it then I'd rather not implement it.

@alexitkes

@TomAugspurger I have just written such a decorator and it does not look too complicated. However it is my first experience with inspect module so possibly more tests are needed though some tests are written and they all run successfully.

It will not be difficult to remove allowed_args argument and always set it to list of required arguments, if you think eventually all arguments having the default values should be keyword-only arguments. Should I do it?

@TomAugspurger

I'm not sure without looking more closely. My thought is that all keyword arguments being keyword-only is easier to explain to users.

On Tue, Jul 30, 2019 at 3:10 AM Alex Itkes ***@***.***> wrote:@TomAugspurger <https://github.com/TomAugspurger> I have just written such a decorator and it does not look too complicated. However it is my first experience with inspect module so possibly more tests are needed though some tests are written and they all run successfully. It will not be difficult to remove allowed_args argument and always set it to list of required arguments, if you think eventually all arguments having the default values should be keyword-only arguments. Should I do it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27573?email_source=notifications&email_token=AAKAOIWDW7YBWLIQNA6SISTQB7ZQNA5CNFSM4IGSQB62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3DFE6A#issuecomment-516313720>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIQDBCCUJD5VIZIZ5VLQB7ZQNANCNFSM4IGSQB6Q> .

@alexitkes

I'm not sure without looking more closely. My thought is that all keyword
arguments being keyword-only is easier to explain to users.

I have almost forgotten one thing - the path_or_buf argument for read_json has the default value None. So if we decorate the function in that way all arguments for read_json will be keyword-only. But I think most users are used to writing read_json(filename) and will not like to write read_json(path_or_buf=filename). For this reason I still think there should be a way to pass the list of arguments allowed to be positional to the decorator. However, the default decorator behavior (optional arguments are keyword only) should be used wherever there is no really significant reason to change it. The read_html function is decorated in this way.

@TomAugspurger

I have almost forgotten one thing - the path_or_buf argument for read_json has the default value None. So if we decorate the function in that way all arguments for read_json will be keyword-only.

Oh sorry. In that case we'll definitely need to have the ability to have some keyword-or-positional arguments.

@alexitkes

Well, it seems to work fine though I am a bit worried because it's my first experience with inspect module. I also don't know the proper number of version in which it will become impossible to use positionl arguments (currently written 1.1). What should it be?

TomAugspurger

"""
df = pd.DataFrame({"A": [2, 4, 6], "B": [3, 6, 9]}, index=[0, 1, 2])
with tm.assert_produces_warning(FutureWarning):
assert_frame_equal(df, read_json(df.to_json(orient="split"), "split"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you minimize the amount of code in the context manager? I think just read_json(buf, "split").

Also, do we need to test all three of these? What differs between them?

def wrapper(*args, **kwargs):
if isinstance(allow_args, int) and len(args) > allow_args:
msg = (
"After version %s all arguments of %s "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version -> pandas version.

Use {}-style string templates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done it. I also added a separate function to form a better looking warning message depending on the number of arguments, but so far I only tested it manually. Is there a proper way to test how exactly does the produced warning look?

@TomAugspurger

@WillAyd @jreback @jorisvandenbossche do you have thoughts on this (the overall goal, the implementation seems fine).

WillAyd

def test_index(self):
df1 = self.read_html(self.spam_data, ".*Water.*", index_col=0)
df2 = self.read_html(self.spam_data, "Unit", index_col=0)
with tm.assert_produces_warning(FutureWarning):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment for non-duplicated tests; would rather not use the deprecated behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, such amount of duplicated tests looks quite ugly, so I removed all tests with deprecated behavior except for one. Would it be enough?

@alexitkes

@alexitkes

@alexitkes

@alexitkes @jorisvandenbossche

Co-Authored-By: Joris Van den Bossche jorisvandenbossche@gmail.com

@alexitkes

@alexitkes @jorisvandenbossche

Co-Authored-By: Joris Van den Bossche jorisvandenbossche@gmail.com

@alexitkes @jorisvandenbossche

Co-Authored-By: Joris Van den Bossche jorisvandenbossche@gmail.com

@alexitkes

@alexitkes

@alexitkes

@alexitkes

@alexitkes

@alexitkes

@alexitkes

WillAyd

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger

Do we have a general policy for which arguments are keyword only?

@jreback

lgtm. on anything with lots of args (pretty much read_*), happy to only have the first arg positional (path_or_buf).

jreback

@jreback

@TomAugspurger

Yeah

On Apr 6, 2020, at 18:25, Jeff Reback ***@***.***> wrote: looks good, @TomAugspurger ok here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

@jreback

@jorisvandenbossche

Labels

IO HTML

read_html, to_html, Styler.apply, Styler.applymap

IO JSON

read_json, to_json, json_normalize