ENH: Return DatetimeIndex or TimedeltaIndex bins for q/cut when input is datelike by mroeschke · Pull Request #20956 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation11 Commits15 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes ENH: cut return bins should be Timestamp/Timedelta objects for datelike data #19891
- tests added / passed
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
@@ -524,6 +524,7 @@ Other Enhancements |
---|
- Added new writer for exporting Stata dta files in version 117, ``StataWriter117``. This format supports exporting strings with lengths up to 2,000,000 characters (:issue:`16450`) |
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`) |
- :func:`date_range` now returns a linearly spaced ``DatetimeIndex`` if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`) |
- :func:`cut` and :func:`qcut` now returns a ``DatetimeIndex`` or ``TimedeltaIndex`` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add :class:
for DTI / TDI
@@ -332,6 +332,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None, |
---|
result = result.astype(np.float64) |
np.putmask(result, na_mask, np.nan) |
bins = _convert_bin_to_datelike_type(bins, dtype) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this just call Index()
which will do all of this inference
Simplified this a bit with the Index constructor; however, I wanted to avoid turing all bins into an Index type unless this is an API change we want to make.
Also, I created #20964 since the Index constructor does not correctly localize values with a datetimetz dtype specified.
@@ -550,6 +550,7 @@ Other Enhancements |
---|
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`) |
- :func:`cut` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`20947`) |
- :func:`date_range`, :func:`timedelta_range`, and :func:`interval_range` now return a linearly spaced index if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`, :issue:`20983`, :issue:`20976`) |
- :func:`cut` and :func:`qcut` now returns a :class:`DatetimeIndex` or :class:`TimedeltaIndex` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.24
bins : Array-like of bins, DatetimeIndex or TimedeltaIndex if dtype is |
---|
datelike |
""" |
# Can be simplified once GH 20964 is fixed. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to fix #20964 first
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke can you rebase & move note to 0.24.0. ping and will have another look.
Updated and ready for a final review @jreback
@@ -364,6 +365,8 @@ def _bins_to_cuts(x, bins, right=True, labels=None, |
---|
result = result.astype(np.float64) |
np.putmask(result, na_mask, np.nan) |
bins = _convert_bin_to_datelike_type(bins, dtype) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to make this logic flow better, e.g. _convert_bins_to_numeric_type
and _convert_bin_to_datelike_type
are disjointed (e.g. they are separate), maybe combine? I just find that we are doing conversions in 2 places here
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well they both serve different purposes;_convert_bins_to_numeric_type
is used at the start to prep the bins (datelike data -> numeric data) before the main array is cut.
_convert_bin_to_datelike_type
is used at the end to "stylize" the bins (numeric data -> datelike data) after the array is cut.
Plus, I like the explicit naming of these two processes. Just my 2c.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that's fine then.
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request
2 participants