Performance issue with pandas/core/common.py -> maybe_box_datetimelike (original) (raw)
Code Sample, a copy-pastable example if possible
The existing implementation is:
def maybe_box_datetimelike(value): # turn a datetime like into a Timestamp/timedelta as needed
if isinstance(value, (np.datetime64, datetime)):
value = tslibs.Timestamp(value)
elif isinstance(value, (np.timedelta64, timedelta)):
value = tslibs.Timedelta(value)
return valueProposed improvement:
def maybe_box_datetimelike(value): # turn a datetime like into a Timestamp/timedelta as needed
if isinstance(value, (np.datetime64, datetime)) and not isinstance(value, tslibs.Timestamp):
value = tslibs.Timestamp(value)
elif isinstance(value, (np.timedelta64, timedelta)):
value = tslibs.Timedelta(value)
return valueProblem description
This function determines whether value is of type (np.datetime64, datetime) and if so, converts it into tslibs.Timestamp. However, the class tslibs.Timestamp is already a subclass of datetime. Therefore, even if the object value is already of type tslibs.Timestamp, it will be needlessly converted one more time. This issue has large performance, when working with large dataframes, which contain datet time objects. This issue could be fixed by changing the conditionif isinstance(value, (np.datetime64, datetime)):
to:if isinstance(value, (np.datetime64, datetime)) and not isinstance(value, tslibs.Timestamp):