ENH: inconsistent naming convention for read_excel column selection (#4988) by abarber4gh · Pull Request #16488 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@abarber4gh

@chris-b1

FYI, just discussing a bug related to usecols #15316, might be easiest to work that change into your refactoring, or can be separate, cc @stanleyguan

jreback

- ``RangeIndex.append`` now returns a ``RangeIndex`` object when possible (:issue:`16212`)
- :func:`to_pickle` has gained a protocol parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL https://docs.python.org/3/library/pickle.html#data-stream-format`__
- :func:`api.types.infer_dtype` now infers decimals. (:issue: `15690`)
- :func:`read_excel` now allows a column character list (E.G. ['A', 'C', 'D']) with the ``usecols`` parameter (:issue:`4988`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are listed this in deprecations so this is unecessary

Deprecations
~~~~~~~~~~~~
- :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with to_excel() (:issue:`10559`).
- :func:`read_excel()` has deprecated ``parse_cols`` in favor of ``usecols`` for consistency with other read_ functions (:issue:`4988`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.read_* functions

* If string then indicates comma separated list of column names and
column ranges (e.g. "A:E" or "A,C,E:F")
parse_cols : int or list, default None
.. deprecated:: 0.21.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, didn't know sphinx had this deprecated tag. @TomAugspurger @jorisvandenbossche should maybe change other deprecations to use this if it looks nice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appears as:

parse_cols : int or list, default None
Deprecated since version 0.21.0: Use usecols instead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger if we like this (and lgtm), let's make an issue to updated all DEPRECATED in codebase with this directive

dfref = self.get_csv_refdf('test1')
dfref = dfref.reindex(columns=['A', 'B', 'C'])
df1 = self.get_exceldf('test1', 'Sheet1', index_col=0, usecols=3)
df2 = self.get_exceldf('test1', 'Sheet2', skiprows=[1], index_col=0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you need to change ALL tests to use the new one (usecols), except for a single test to actually hit the deprecation.

tm.assert_frame_equal(df1, dfref, check_names=False)
tm.assert_frame_equal(df2, dfref, check_names=False) # backward compat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these new tests?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, new tests for the new functionality.

@codecov

@codecov

jreback

* If string then indicates comma separated list of column names and
column ranges (e.g. "A:E" or "A,C,E:F")
parse_cols : int or list, default None
.. deprecated:: 0.21.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger if we like this (and lgtm), let's make an issue to updated all DEPRECATED in codebase with this directive

"""
def _excel2num(x):
"Convert Excel column name like 'AB' to 0-based column index"
return reduce(lambda s, a: s * 26 + ord(a) - ord('A') + 1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you are changing this code? does this involve the deprecation somehow? if you are cleaning/fixing, pls do in another PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changed to allow passing list of strings, similar to other read_* functions, and keep the read_excel "superpower" to pass a string that is parsed as mentioned in the original issue (#4988, point 3).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that's fine, but needs to be in a separate PR from the deprecation change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, opened #16510 for this functionality. will re-submit with only the argument change (parse_cols -> usecols).

tm.assert_frame_equal(df2, dfref, check_names=False)
tm.assert_frame_equal(df3, dfref, check_names=False) # backward compat
def test_parse_cols_str(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave the original tests structure (sure you can change the name to conform), but don't change the tests (in THIS PR).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are back to original but with changed function & kwarg names.

@TomAugspurger

@abarber4gh can you rebase, now that the excel tests are running?

@abarber4gh

@abarber4gh

…n selection (#4988)

update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’, added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test cases that use ‘parse_cols’.

refactor column use column parsing to only occur once per sheet.

updated whats new with deprecated parse_col argument and other enhancements to usecols functionality. update documentation to show new usecols functionality in read_excel().

@abarber4gh

@abarber4gh

@abarber4gh

@abarber4gh

@abarber4gh

add check_stacklevel=False to test_excel_oldindex_format()

@codecov

@abarber4gh @jreback

@gfyoung @TomAugspurger

@neirbowj @jreback

@TomAugspurger @jreback

@keitakurita @jreback

@TomAugspurger

Replaces most uses of implicit global state from matplotlib in test_datetimelike.py. This was potentially causing random failures where a figure expected to be on a new, blank figure would instead plot on an existing axes (that's the guess at least).

@VincentLa @jorisvandenbossche

@ProsperousHeart @jorisvandenbossche

@jtratner @jreback

@Tafkas @jreback

@JosephWagner @jreback

@jreback

@TomAugspurger

@benjello @TomAugspurger

@makmanalp @TomAugspurger

@bpraggastis @TomAugspurger

…ches (#16460)

@TomAugspurger

@TomAugspurger

@jreback

@jhelie @jreback

@jreback

@dsm054 @jreback

@dsm054 @jreback

@dsm054 @jreback

@preddy5 @jorisvandenbossche

@jreback

@tswast @jreback

Splits extra information about the license and copyright holders to AUTHORS.md.

@jreback

@jreback

@abarber4gh

@abarber4gh

…n selection (#4988)

update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’, added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test cases that use ‘parse_cols’.

refactor column use column parsing to only occur once per sheet.

updated whats new with deprecated parse_col argument and other enhancements to usecols functionality. update documentation to show new usecols functionality in read_excel().

@abarber4gh

@abarber4gh

@abarber4gh

@abarber4gh

@abarber4gh

add check_stacklevel=False to test_excel_oldindex_format()

@abarber4gh

…o issue#4988

@abarber4gh

@jreback

can you rebase / update according to comments

@jreback

closing as stale. pls ping to reopen if you want to continue.

Labels