Performance issue with pandas/core/common.py -> maybe_box_datetimelike (original) (raw)

Code Sample, a copy-pastable example if possible

The existing implementation is:

def maybe_box_datetimelike(value): # turn a datetime like into a Timestamp/timedelta as needed

if isinstance(value, (np.datetime64, datetime)):
    value = tslibs.Timestamp(value)
elif isinstance(value, (np.timedelta64, timedelta)):
    value = tslibs.Timedelta(value)

return value

Proposed improvement:

def maybe_box_datetimelike(value): # turn a datetime like into a Timestamp/timedelta as needed

if isinstance(value, (np.datetime64, datetime)) and not isinstance(value, tslibs.Timestamp):
    value = tslibs.Timestamp(value)
elif isinstance(value, (np.timedelta64, timedelta)):
    value = tslibs.Timedelta(value)

return value

Problem description

This function determines whether value is of type (np.datetime64, datetime) and if so, converts it into tslibs.Timestamp. However, the class tslibs.Timestamp is already a subclass of datetime. Therefore, even if the object value is already of type tslibs.Timestamp, it will be needlessly converted one more time. This issue has large performance, when working with large dataframes, which contain datet time objects. This issue could be fixed by changing the condition
if isinstance(value, (np.datetime64, datetime)):
to:
if isinstance(value, (np.datetime64, datetime)) and not isinstance(value, tslibs.Timestamp):