BUG: silent overflow in DateTimeArray subtraction by shengpu-tang · Pull Request #22508 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@shengpu-tang

@shengpu-tang

@shengpu-tang

Hi @jbrockmendel , thanks for the follow-up.

@codecov

@gfyoung

@shengpu1126 : You should make the changes so that it gets the expected output that you reported in the original issue. And yes, that does seem like a good place to put the test. Don't forget a whatsnew entry too.

@jbrockmendel

@gfyoung it isn’t that easy. If you look at the scalar op in the original example, it returns a stdlib timedelta, not a pd.Timedelta.

@shengpu1126 i think raising overflowerror is correct, will doublecheck.

@gfyoung

it isn’t that easy. If you look at the scalar op in the original example, it returns a stdlib timedelta, not a pd.Timedelta.

@jbrockmendel : Ah, I see. My response was motivated by the fact that you didn't comment on @shengpu1126 expected output in the original issue, which I took to be a tacit agreement of the expected behavior for patching this bug.

@jbrockmendel

Test will go in tests.arithmetic.test_datetime64

@shengpu-tang

@pep8speaks

Hello @shengpu1126! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 30, 2018 at 15:56 Hours UTC

@shengpu-tang

jreback

- Bug in :class:`DataFrame` comparisons against ``Timestamp``-like objects failing to raise ``TypeError`` for inequality checks with mismatched types (:issue:`8932`,:issue:`22163`)
- Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`)
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`)
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise `OverflowError` (:issue:22492, :issue:22508)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a little more here, e.g. when this fails to raise

dtimin - variant
def test_datetimeindex_sub_datetimeindex_overflow(self):
dtimax = pd.to_datetime(['now', pd.Timestamp.max])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment with the issue number

@shengpu-tang

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments. ping on green (or at least except for 3.6/3.5 builds are currently failing to an unrelated error). so might want to wait a little bit to rebase.

- Bug in :class:`DataFrame` comparisons against ``Timestamp``-like objects failing to raise ``TypeError`` for inequality checks with mismatched types (:issue:`8932`,:issue:`22163`)
- Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`)
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`)
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise `OverflowError` when the difference between :class:`Timestamp`s (a :class:`Timedelta` object) exceeds `int64` limit (:issue:`22492`, :issue:`22508`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now confusing. let's go back to the prior one;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about that!
for the record, it was mentioned here that pd.Timedeltas are represented as nanoseconds using 64 bit integers, and I believe that's the source of potential overflow.

dtimax - ts_neg
expected = pd.Timestamp.max.value - ts_pos[1].value
res = dtimax - ts_pos

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result (rather than res) in all occurrences

assert res[1].value == expected
with pytest.raises(OverflowError):
dtimin - ts_pos

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment or 2 to delineate the various cases

@shengpu-tang

@jreback

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request

Oct 1, 2018

Labels