BUG: silent overflow in DateTimeArray subtraction by shengpu-tang · Pull Request #22508 · pandas-dev/pandas (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes Incorrect timedelta computation of pd.Series datetime64[ns] if timedelta is too large #22492
- tests added / passed
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff - whatsnew entry
Hi @jbrockmendel , thanks for the follow-up.
- With the suggested change, it now throws
OverflowError: Overflow in int64 addition. Is this desired behavior? - Where should I add the tests? Maybe under
tests/arrays/test_datetimelike.py?
@shengpu1126 : You should make the changes so that it gets the expected output that you reported in the original issue. And yes, that does seem like a good place to put the test. Don't forget a whatsnew entry too.
@gfyoung it isn’t that easy. If you look at the scalar op in the original example, it returns a stdlib timedelta, not a pd.Timedelta.
@shengpu1126 i think raising overflowerror is correct, will doublecheck.
it isn’t that easy. If you look at the scalar op in the original example, it returns a stdlib timedelta, not a pd.Timedelta.
@jbrockmendel : Ah, I see. My response was motivated by the fact that you didn't comment on @shengpu1126 expected output in the original issue, which I took to be a tacit agreement of the expected behavior for patching this bug.
Test will go in tests.arithmetic.test_datetime64
Hello @shengpu1126! Thanks for updating the PR.
Cheers ! There are no PEP8 issues in this Pull Request. 🍻
Comment last updated on August 30, 2018 at 15:56 Hours UTC
| - Bug in :class:`DataFrame` comparisons against ``Timestamp``-like objects failing to raise ``TypeError`` for inequality checks with mismatched types (:issue:`8932`,:issue:`22163`) |
|---|
| - Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`) |
| - Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`) |
| - Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise `OverflowError` (:issue:22492, :issue:22508) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a little more here, e.g. when this fails to raise
| dtimin - variant |
|---|
| def test_datetimeindex_sub_datetimeindex_overflow(self): |
| dtimax = pd.to_datetime(['now', pd.Timestamp.max]) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment with the issue number
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments. ping on green (or at least except for 3.6/3.5 builds are currently failing to an unrelated error). so might want to wait a little bit to rebase.
| - Bug in :class:`DataFrame` comparisons against ``Timestamp``-like objects failing to raise ``TypeError`` for inequality checks with mismatched types (:issue:`8932`,:issue:`22163`) |
|---|
| - Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`) |
| - Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`) |
| - Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise `OverflowError` when the difference between :class:`Timestamp`s (a :class:`Timedelta` object) exceeds `int64` limit (:issue:`22492`, :issue:`22508`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now confusing. let's go back to the prior one;
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about that!
for the record, it was mentioned here that pd.Timedeltas are represented as nanoseconds using 64 bit integers, and I believe that's the source of potential overflow.
| dtimax - ts_neg |
|---|
| expected = pd.Timestamp.max.value - ts_pos[1].value |
| res = dtimax - ts_pos |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use result (rather than res) in all occurrences
| assert res[1].value == expected |
|---|
| with pytest.raises(OverflowError): |
| dtimin - ts_pos |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment or 2 to delineate the various cases
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request