ENH: Return DatetimeIndex or TimedeltaIndex bins for q/cut when input is datelike by mroeschke · Pull Request #20956 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation11 Commits15 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

mroeschke

@mroeschke

@mroeschke

@codecov

jreback

@@ -524,6 +524,7 @@ Other Enhancements
- Added new writer for exporting Stata dta files in version 117, ``StataWriter117``. This format supports exporting strings with lengths up to 2,000,000 characters (:issue:`16450`)
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`)
- :func:`date_range` now returns a linearly spaced ``DatetimeIndex`` if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`)
- :func:`cut` and :func:`qcut` now returns a ``DatetimeIndex`` or ``TimedeltaIndex`` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add :class: for DTI / TDI

@@ -332,6 +332,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None,
result = result.astype(np.float64)
np.putmask(result, na_mask, np.nan)
bins = _convert_bin_to_datelike_type(bins, dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this just call Index() which will do all of this inference

@mroeschke

@mroeschke

@mroeschke

Simplified this a bit with the Index constructor; however, I wanted to avoid turing all bins into an Index type unless this is an API change we want to make.

Also, I created #20964 since the Index constructor does not correctly localize values with a datetimetz dtype specified.

jreback

@@ -550,6 +550,7 @@ Other Enhancements
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`)
- :func:`cut` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`20947`)
- :func:`date_range`, :func:`timedelta_range`, and :func:`interval_range` now return a linearly spaced index if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`, :issue:`20983`, :issue:`20976`)
- :func:`cut` and :func:`qcut` now returns a :class:`DatetimeIndex` or :class:`TimedeltaIndex` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.24

bins : Array-like of bins, DatetimeIndex or TimedeltaIndex if dtype is
datelike
"""
# Can be simplified once GH 20964 is fixed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to fix #20964 first

@mroeschke

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke can you rebase & move note to 0.24.0. ping and will have another look.

@mroeschke

Updated and ready for a final review @jreback

@mroeschke

jreback

@@ -364,6 +365,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None,
result = result.astype(np.float64)
np.putmask(result, na_mask, np.nan)
bins = _convert_bin_to_datelike_type(bins, dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to make this logic flow better, e.g. _convert_bins_to_numeric_type and _convert_bin_to_datelike_type are disjointed (e.g. they are separate), maybe combine? I just find that we are doing conversions in 2 places here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well they both serve different purposes;
_convert_bins_to_numeric_type is used at the start to prep the bins (datelike data -> numeric data) before the main array is cut.

_convert_bin_to_datelike_type is used at the end to "stylize" the bins (numeric data -> datelike data) after the array is cut.

Plus, I like the explicit naming of these two processes. Just my 2c.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that's fine then.

jreback

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request

Oct 1, 2018

@mroeschke

2 participants

@mroeschke @jreback