Fix arithmetic errors with timedelta64 dtypes by jbrockmendel · Pull Request #22390 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Kind of surprising, but I don't see any Issues for these bugs. Might be because many of them were found after parametrizing arithmetic tests, so instead of Issues we had xfails.

xref #20088
xref #22163

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

@@ -576,7 +576,10 @@ Datetimelike
- Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`)
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`)
- Bug in :class:`DataFrame` with ``timedelta64[ns]`` dtype division by ``Timedelta``-like scalar incorrectly returning ``timedelta64[ns]`` dtype instead of ``float64`` dtype (:issue:`20088`,:issue:`22163`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split out to a timedelta bug section (as this is getting kind of long)?

can be in a future PR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we now have a separate section?

@@ -596,6 +597,9 @@ def _evaluate_numeric_binop(self, other):
# GH#19333 is_integer evaluated True on timedelta64,
# so we need to catch these explicitly
return op(self._int64index, other)
elif is_timedelta64_dtype(other):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the fall thru raise TypeError?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand the question

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your comment is at odds with the checking i think. pls revise comments and/or checks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment below this line # must be an np.ndarray; GH#22390 along with the check above elif is_timedelta64_dtype(other) is just saying this is an ndarray["timedelta64['ns']"]. I don't think these are at odds.

# timedelta64 dtypes because numpy casts integer dtypes to
# timedelta64 when operating with timedelta64
if isinstance(right, np.ndarray):
# upcast to TimedeltaIndex before dispatching

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhh this seems like a lot of special casing, any way to make this simpler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only simplification I considered but didn't use was something like:

def maybe_upcast_for_op(obj):
     if type(obj) is timedelta:
        return pd.Timedelta(obj)
    if isinstance(obj, np.ndarray) and is_timedelta64_dtype(obj):
        return pd.TimedeltaIndex(obj)
    return obj

which would go right after get_op_result_name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe re-route the scalars to a separate path. This is getting way special casey here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll implement the maybe_upcast idea and see if that is prettier

This will probably need rebasing after #22350 goes through

index=left.index, name=res_name,
dtype=result.dtype)

elif type(right) is datetime.timedelta:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case you can include above (before what you added), no?

if isintance(right, (datetime.timedelta, Timedelta)):
  right = Timedelta(right)
....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version code above casts to TimedeltaIndex, not Timedelta.

# timedelta64 dtypes because numpy casts integer dtypes to
# timedelta64 when operating with timedelta64
if isinstance(right, np.ndarray):
# upcast to TimedeltaIndex before dispatching

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe re-route the scalars to a separate path. This is getting way special casey here.

@@ -596,6 +597,9 @@ def _evaluate_numeric_binop(self, other):
# GH#19333 is_integer evaluated True on timedelta64,
# so we need to catch these explicitly
return op(self._int64index, other)
elif is_timedelta64_dtype(other):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your comment is at odds with the checking i think. pls revise comments and/or checks.

Implemented maybe_upcast_for_op. There may be other places where this can be used; I'll take a look in a follow-up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor doc-comment. rebase and ping on green.

@@ -576,7 +576,10 @@ Datetimelike
- Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`)
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`)
- Bug in :class:`DataFrame` with ``timedelta64[ns]`` dtype division by ``Timedelta``-like scalar incorrectly returning ``timedelta64[ns]`` dtype instead of ``float64`` dtype (:issue:`20088`,:issue:`22163`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we now have a separate section?

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request

2 participants