Masking and overflow checks for datetimeindex and timedeltaindex ops by jbrockmendel · Pull Request #18020 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation34 Commits17 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

jbrockmendel

There are a bunch of new tests (not obvious what the appropriate place is for a WIP test matrix like this, pls advise). The ones that will fail under master are test_timedeltaindex_add_timestamp_nat_masking and test_datetimeindex_sub_timestamp_overflow

Start filling out a test matrix of arithmetic ops.

jreback

# - timezone-aware variants
# - object-dtype, categorical dtype
# - PeriodIndex
# - consistency with .map(...) ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look thru test_ops, there is lots of coverage for things like this already (or maybe test_base). don't create a giant matrix, rather parametrize as much as possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look. After adding tests for #7996 (separate branch/PR), this class gets pretty huge. So yah, parameterization sounds nice.

(Also it looks like tests in this module don't get run, so that needs changing anyway).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also it looks like tests in this module don't get run, so that needs changing anyway).

sure they do, classes inherit from this. Pls pls pls don't create a huge matrix of tests w/o looking thru the existing. we cover quite a bit of this already.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls pls pls don't create a huge matrix of tests w/o looking thru the existing. we cover quite a bit of this already.

Message received. Worrying about correctness first, brevity later.

@jbrockmendel

@codecov

@jbrockmendel

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@jbrockmendel

OK, just removed the new file, took the two tests that currently fail on master and moved them into existing datetime and timedelta test files.

@jbrockmendel

@jbrockmendel

@jorisvandenbossche

Does this also fix #7996 ?

@jbrockmendel

Does this also fix #7996 ?

No, but it does fix a related bug that probably belongs in the same issue

In [3]: s = pd.Series(pd.date_range('20130101',periods=3))
In [4]: dti = pd.DatetimeIndex(s)

In [5]: s-np.datetime64('2013-01-01')
Out[5]: 
0   15705 days 23:59:59.999984
1   15706 days 23:59:59.999984
2   15707 days 23:59:59.999984
dtype: timedelta64[ns]

In [6]: dti - np.datetime64('2013-01-01')
Out[6]: DatetimeIndex(['1970-01-01', '1970-01-02', '1970-01-03'], dtype='datetime64[ns]', freq=None)

@jbrockmendel

@jbrockmendel

Rebased and pushed; hoping that magically fixes the CI errors

@jbrockmendel

@jbrockmendel

AFAICT the test failures here were caused because of fragility in test_NaT_methods and TestTimestamp.test_to_datetime_depr. Specifically, a test that this PR added (until moments ago) called ts.to_datetime() instead of ts.to_pydatetime(). This triggered a FutureWarning, which resulted in FutureWarning not being called in the tests that specifically check for it.

I think the immediate issue is now fixed, but ideally test_to_datetime_depr and test_NaT_methods would be isolated enough not to depend on test ordering.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a whatsnew note. 0.21.1 is fine.

@jbrockmendel

jreback

@@ -447,6 +447,40 @@ def f():
t - offset
pytest.raises(OverflowError, f)
def test_datetimeindex_sub_timestamp_overflow(self):
dtimax = pd.to_datetime(['now', pd.Timestamp.max])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue for these

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no issue for this; I noticed it when tracking down the TimedeltaIndex bug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure there is, the PR number!

for variant in ts_neg_variants + ts_pos_variants:
res = tdinat + variant
assert res[1] is pd.NaT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check sub as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would check both add/sub for the reverse, e.g. variant + tdinat (and -)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well assert fully, e.g.

tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do the full add/sub/radd/rsub matrix... that was kind of what I started out with. The question becomes where to put it, since arithmetic tests are scattered about. My preference would be new test modules test_arithmetic in each of indexes.timedeltas, indexes.datetimes, and indexes.periods where I can a) put these new tests and b) collect the arithmetic tests that are currently scattered about.

See discussion in #18026, #18036.

But for now I'll just edit the contents of the tests already introduced.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well assert fully, e.g.

tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))

The first entry is not NaT, will not be constant across all of the variants (though the variants could be split into two groups over which it should be unchanging)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_arithmetic sounds like a good name. key is to share code as much as possible, via fixtures / parametrization / inheritence. We ideally want these objects to act as similar as possible, so keeping special cases to a minimum is important.

note before we do this, I think splitting out the tz-aware tests to its own hierarchy should be done.

see #17583 and #17694

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for now I'll just edit the contents of the tests already introduced.

Actually as I look at this, I'd much rather close out this bug fix and follow up with the Do It Right approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly. bug fixes are good. separate, self-contained refactorings to make things more readable are better!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. After the current deluge of clears up, I'll circle back to this.

expected = pd.Timestamp.min.value - tsneg.value
for variant in ts_neg_variants:
res = dtimin - variant

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as below, assert fully the result type.

@jbrockmendel

Where did we land on this? My preference is to get this bug-fix in and worry about the rest in #18049+followups.

@jreback

@jbrockmendel

@jbrockmendel

Just rebased. The two new tests are unchanged, just moved to the appropriate locations in test_arithmetic.

@jbrockmendel

@jreback

looks fine, tiny doc change. ping on green.

jreback

@@ -57,6 +57,8 @@ Documentation Changes
Bug Fixes
~~~~~~~~~
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when `NaT` is present (:issue:`17791`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks around NaT

@@ -57,6 +57,8 @@ Documentation Changes
Bug Fixes
~~~~~~~~~
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when `NaT` is present (:issue:`17791`)
- Bug in :class:`DatetimeIndex` subtraction could fail to overflow (:issue:`18020`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand this a bit (e.g. subtracting what)

@jbrockmendel

@jbrockmendel

jreback

@jreback

@jbrockmendel

@jreback

1kastner pushed a commit to 1kastner/pandas that referenced this pull request

Nov 5, 2017

@jbrockmendel @1kastner

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request

Nov 28, 2017

@jbrockmendel @No-Stream

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request

Dec 11, 2017

@jbrockmendel @TomAugspurger

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request

Dec 11, 2017

@TomAugspurger