API: Deprecate skip_footer in read_csv by gfyoung · Pull Request #13386 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation63 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

gfyoung

Title is self-explanatory.

Closes gh-13349 and partially undoes this commit back in v0.9.0. With such a massive API now, having duplicate arguments makes managing it way less practical.

@gfyoung

In light of removing duplicate arguments and stripping down the API a bit, here are a couple of others I would like to propose (perhaps not in this PR but for discussion nonetheless):

  1. Deprecate skipfooter in read_excel (it's allowed as an alternative to skip_footer, though it is not surfaced in the signature but rather encompassed in a **kwargs argument)?
  2. Should we choose between sep and delimiter in read_csv?

@jreback

the problem is that we also have skiprows. I would actually deprecate skip_footer instead.

@gfyoung

Hmmm...that's a flip-flop AFAICT from what you said earlier here. However, deprecating skip_footer seems a little strange since we use skip_footer all over the place in the code (internally especially). That's why I chose to deprecate skipfooter, since the code impact is not very significant as you can tell. In addition, we use the _ in a lot of the other arguments, so I'd rather side with the majority.

@jreback

just change it internally to the correct kw. I think they diverged at some point. It looks (just a quick skim), that skipfooter is more consistent (as we have skiprows, usecols) etc. excel should be fixed to be the same as well.

@gfyoung

@jreback : let's actually do the count of multi-word argument names (excluding the skip(_)footer ones) using the documentation as of 0.18.1 here:

Arguments that use _:
filepath_or_buffer
index_col
mangle_dupe_cols
true_values
false_values
na_values
keep_default_na
na_filter
skip_blank_lines
error_bad_lines
warn_bad_lines
delim_whitespace
as_recarray
compact_ints
use_unsigned
low_memory
buffer_lines
memory_map
float_precision

Arguments that don't use _:
usecols
skipinitialspace
skiprows
chunksize
lineterminator
doublequote
quotechar
escapechar

Even excluding ones that have been recently deprecated (compact_ints, buffer_lines, use_unsigned, as_recarray), arguments that use _ vastly outnumber those that don't.

@jreback

@gfyoung I am trying to minimize back-compat pain.

usecols and skiprows are prob some of the most used options. I agree we should just deprecate and move to _ separated, but that would be in 0.19.0 if we go whole hog.

@codecov-io

Current coverage is 85.23% (diff: 100%)

Merging #13386 into master will increase coverage by <.01%

@@ master #13386 diff @@

Files 140 140
Lines 50420 50419 -1
Methods 0 0
Messages 0 0
Branches 0 0

Hits 42975 42975

Powered by Codecov. Last update cc216ad...d21345f

@gfyoung

@jreback : I have no issues waiting until 0.19.0 and going full hog. This duplicate argument is not breaking, and it will then give us the time to properly deprecate and rename arguments with _.

@gfyoung gfyoung changed the titleDEPR: deprecate skipfooter in read_csv API: Use underscore in read_* arguments

Jun 8, 2016

@gfyoung gfyoung changed the titleAPI: Use underscore in read_* arguments API: Use underscore in read_csv arguments

Jun 9, 2016

@shoyer

I would lean against this change -- I don't think deprecating or even adding aliases with underscores for these keywords is worth it. Yes, consistency is nice, but these are keywords that are mostly either (a) copied straight from the API docs or an example or (b) auto-completed. Omitting underscores between lower case words is also common enough in PEP8 compliant code that it doesn't hurt my eyes in the way that camelCase would, for example. I would save changes like this for next time somebody writes a DataFrame library ;). There's just very little upside to the change at this point.

@chris-b1

I'd also be inclined to not do this.

Consistency is great, but to me, big breaking (or warning triggering) changes for consistency are only worth it to the extent they actually improve the user experience. E.g. resample, the new sort_... api, or a closer example, when the inconsistency between the rows and index keyword was resolved in pivot_table / pivot - #5505

This doesn't seem to meet that hurdle. Like @shoyer said, these argument would be typically tab-completed or looked up, or even if not, the inconsistency doesn't cause any confusion.

@gfyoung

To put things in perspective, this whole discussion was generated because I wanted to remove the duplicate 'skipfooter/skip_footer' argument in the signature. It really should only be one of the two IMO.

The consistency part came up because we weren't sure which way to go i.e do we take the underscore version or the non-underscore?

I perfectly understand if we don't want to make such a change as drastic as it has become, though some input on the initial goal would be good then.

@chris-b1

I don't have a strong opinion on that, but given that all the docs / SO answers / etc in the wild use skipfooter, I'd think deprecating skip_footer makes more sense?

@gfyoung gfyoung changed the titleAPI: Use underscore in read_csv arguments API: Deprecate skip_footer in read_csv

Jul 15, 2016

@gfyoung

In light of the push back, I will rollback my changes and just deprecate skip_footer as @chris-b1 indicated. If skipfooter is being used in the wild, let's stick with that one then.

@gfyoung

@jreback , @jorisvandenbossche : Rolled back my changes successfully (i.e. Travis is passing) to just deprecating skip_footer for the reason that @chris-b1 outlined above. Ready to merge if there are no other concerns.

@jreback

@gfyoung

@jorisvandenbossche

What do we do other read_ functions for the skipfooter case?

@gfyoung

Same as here pre-PR (i.e. it just accepts both of them). However, I will handle all other cases similarly if this one is merged.

@gfyoung

jreback

@@ -1351,7 +1351,7 @@ back to python if C-unsupported options are specified. Currently, C-unsupported
options include:
- ``sep`` other than a single character (e.g. regex separators)
- ``skip_footer``
- ``skipfooter``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the doc-string get changed (to add DEPRECATED)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...not sure where those changes went. Added them back.

@gfyoung

@gfyoung

@jreback : added the documentation back about the deprecation, and Travis is still happy after rebase. Ready to merge if there are no other concerns.

@jreback

@gfyoung gfyoung deleted the deprecate-dup-skipfooter branch

July 29, 2016 02:29

gfyoung added a commit to forking-repos/pandas that referenced this pull request

Dec 11, 2017

@gfyoung

gfyoung added a commit to forking-repos/pandas that referenced this pull request

Dec 11, 2017

@gfyoung

gfyoung added a commit that referenced this pull request

Dec 11, 2017

@gfyoung

Deprecated back in 0.19.0.

xref gh-13386.