API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option by mroeschke · Pull Request #22644 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation60 Commits64 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

mroeschke

Currently, users do not have any control over nonexistent datetime handling when tz_localizeing like they do ambiguous times. This adds a new keyword nonexistent to tz_localize so that users now can:

'raise': Raise an error (default)
'NaT': Replace nonexistent times with 'NaT'
'shift': Shift nonexistent times forward to the closest existing time

Matt Roeschke added 14 commits

September 3, 2018 23:12

@pep8speaks

@codecov

Matt Roeschke added 4 commits

September 11, 2018 14:02

jreback

Matt Roeschke added 2 commits

September 19, 2018 09:44

@mroeschke

@jreback re: overlap between errors and nonexistent

  1. The original issue mentions Timestamp.tz_localize() NonExistentTimeError handling #8917 (comment) having the ability to control over ambiguous times vs nonexistent times independently.
    1a) nonexistent and ambiguous can handle their own errors in this PR independently
  2. This PR has the ability to shift the nonexistent time to a real time (like how ambiguous can take True/False

So I would propose that eventually we can depreciate errors and keep both ambiguous and nonexistent

@jreback

jorisvandenbossche

@@ -565,6 +565,8 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift' will shift the nonexistent time forward to the closest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, rst formatting nitpick: there needs to be a blank line between the first sentences, and the start of this list ... (getting rst right can be annoying ..)

@mroeschke

jreback

@pytest.mark.parametrize('tz', ['Europe/Warsaw', 'dateutil/Europe/Warsaw'])
@pytest.mark.parametrize('method, exp', [
['shift', '2015-03-29 03:00:00'],
['NaT', pd.NaT],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have tests that exericse the assertion when you pass a nonexistent keyword that is invalid?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a test for an invalid nonexistent keyword.

Matt Roeschke added 2 commits

October 18, 2018 14:43

jorisvandenbossche

@@ -978,14 +979,26 @@ class Timestamp(_Timestamp):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time
errors : 'raise', 'coerce', default 'raise'
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone
@@ -639,15 +639,27 @@ def tz_localize(self, tz, ambiguous='raise', errors='raise'):
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times
errors : {'raise', 'coerce'}, default 'raise'
nonexistent : 'shift', 'NaT' default 'raise'
A nonexistent time doesn't not exist in a particular timezone

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone
@@ -8659,6 +8659,17 @@ def tz_localize(self, tz, axis=0, level=None, copy=True,
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone

Matt Roeschke added 2 commits

October 19, 2018 09:55

@mroeschke

@jreback

one more rebase and I think ok to go

Matt Roeschke added 2 commits

October 24, 2018 09:57

@jreback

thoo added a commit to thoo/pandas that referenced this pull request

Oct 27, 2018

@thoo

thoo added a commit to thoo/pandas that referenced this pull request

Oct 27, 2018

@thoo

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request

Nov 19, 2018

@mroeschke @tm9k1

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@mroeschke @Pingviinituutti

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@mroeschke @Pingviinituutti