API: Deprecate skip_footer in read_csv by gfyoung · Pull Request #13386 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation63 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Title is self-explanatory.
Closes gh-13349 and partially undoes this commit back in v0.9.0
. With such a massive API now, having duplicate arguments makes managing it way less practical.
In light of removing duplicate arguments and stripping down the API a bit, here are a couple of others I would like to propose (perhaps not in this PR but for discussion nonetheless):
- Deprecate
skipfooter
inread_excel
(it's allowed as an alternative toskip_footer
, though it is not surfaced in the signature but rather encompassed in a**kwargs
argument)? - Should we choose between
sep
anddelimiter
inread_csv
?
the problem is that we also have skiprows
. I would actually deprecate skip_footer
instead.
Hmmm...that's a flip-flop AFAICT from what you said earlier here. However, deprecating skip_footer
seems a little strange since we use skip_footer
all over the place in the code (internally especially). That's why I chose to deprecate skipfooter
, since the code impact is not very significant as you can tell. In addition, we use the _
in a lot of the other arguments, so I'd rather side with the majority.
just change it internally to the correct kw. I think they diverged at some point. It looks (just a quick skim), that skipfooter
is more consistent (as we have skiprows
, usecols
) etc. excel should be fixed to be the same as well.
@jreback : let's actually do the count of multi-word argument names (excluding the skip(_)footer
ones) using the documentation as of 0.18.1
here:
Arguments that use _
:filepath_or_buffer
index_col
mangle_dupe_cols
true_values
false_values
na_values
keep_default_na
na_filter
skip_blank_lines
error_bad_lines
warn_bad_lines
delim_whitespace
as_recarray
compact_ints
use_unsigned
low_memory
buffer_lines
memory_map
float_precision
Arguments that don't use _
:usecols
skipinitialspace
skiprows
chunksize
lineterminator
doublequote
quotechar
escapechar
Even excluding ones that have been recently deprecated (compact_ints
, buffer_lines
, use_unsigned
, as_recarray
), arguments that use _
vastly outnumber those that don't.
@gfyoung I am trying to minimize back-compat pain.
usecols
and skiprows
are prob some of the most used options. I agree we should just deprecate and move to _
separated, but that would be in 0.19.0 if we go whole hog.
Current coverage is 85.23% (diff: 100%)
@@ master #13386 diff @@
Files 140 140
Lines 50420 50419 -1
Methods 0 0
Messages 0 0
Branches 0 0
Hits 42975 42975
- Misses 7445 7444 -1
Partials 0 0
Powered by Codecov. Last update cc216ad...d21345f
@jreback : I have no issues waiting until 0.19.0
and going full hog. This duplicate argument is not breaking, and it will then give us the time to properly deprecate and rename arguments with _
.
gfyoung changed the title
DEPR: deprecate skipfooter in read_csv API: Use underscore in read_* arguments
gfyoung changed the title
API: Use underscore in read_* arguments API: Use underscore in read_csv arguments
I would lean against this change -- I don't think deprecating or even adding aliases with underscores for these keywords is worth it. Yes, consistency is nice, but these are keywords that are mostly either (a) copied straight from the API docs or an example or (b) auto-completed. Omitting underscores between lower case words is also common enough in PEP8 compliant code that it doesn't hurt my eyes in the way that camelCase would, for example. I would save changes like this for next time somebody writes a DataFrame library ;). There's just very little upside to the change at this point.
I'd also be inclined to not do this.
Consistency is great, but to me, big breaking (or warning triggering) changes for consistency are only worth it to the extent they actually improve the user experience. E.g. resample
, the new sort_...
api, or a closer example, when the inconsistency between the rows
and index
keyword was resolved in pivot_table
/ pivot
- #5505
This doesn't seem to meet that hurdle. Like @shoyer said, these argument would be typically tab-completed or looked up, or even if not, the inconsistency doesn't cause any confusion.
To put things in perspective, this whole discussion was generated because I wanted to remove the duplicate 'skipfooter/skip_footer' argument in the signature. It really should only be one of the two IMO.
The consistency part came up because we weren't sure which way to go i.e do we take the underscore version or the non-underscore?
I perfectly understand if we don't want to make such a change as drastic as it has become, though some input on the initial goal would be good then.
I don't have a strong opinion on that, but given that all the docs / SO answers / etc in the wild use skipfooter
, I'd think deprecating skip_footer
makes more sense?
gfyoung changed the title
API: Use underscore in read_csv arguments API: Deprecate skip_footer in read_csv
In light of the push back, I will rollback my changes and just deprecate skip_footer
as @chris-b1 indicated. If skipfooter
is being used in the wild, let's stick with that one then.
@jreback , @jorisvandenbossche : Rolled back my changes successfully (i.e. Travis is passing) to just deprecating skip_footer
for the reason that @chris-b1 outlined above. Ready to merge if there are no other concerns.
What do we do other read_
functions for the skipfooter case?
Same as here pre-PR (i.e. it just accepts both of them). However, I will handle all other cases similarly if this one is merged.
@@ -1351,7 +1351,7 @@ back to python if C-unsupported options are specified. Currently, C-unsupported |
---|
options include: |
- ``sep`` other than a single character (e.g. regex separators) |
- ``skip_footer`` |
- ``skipfooter`` |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did the doc-string get changed (to add DEPRECATED)?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...not sure where those changes went. Added them back.
@jreback : added the documentation back about the deprecation, and Travis is still happy after rebase. Ready to merge if there are no other concerns.
gfyoung deleted the deprecate-dup-skipfooter branch
gfyoung added a commit to forking-repos/pandas that referenced this pull request
gfyoung added a commit to forking-repos/pandas that referenced this pull request
gfyoung added a commit that referenced this pull request
Deprecated back in 0.19.0.
xref gh-13386.