BUG: Reindexing two tz-aware indices drops tz on the target index when tolerance and method is specified for only "ffill" and "bfill" · Issue #38566 · pandas-dev/pandas (original) (raw)


Code Sample

df = pd.DataFrame({'value': [0, 1, 2, 3]}, index=[pd.Timestamp('2020-01-01 05:00:00+0000', tz='UTC'), pd.Timestamp('2020-01-01 06:00:00+0000', tz='UTC'), pd.Timestamp('2020-01-01 07:00:00+0000', tz='UTC'), pd.Timestamp('2020-01-01 08:00:00+0000', tz='UTC') ] )

new_index = pd.Series([pd.Timestamp('2020-01-01 5:30:00+0000', tz='UTC'), pd.Timestamp('2020-01-01 6:30:00+0000', tz='UTC'), pd.Timestamp('2020-01-01 7:30:00+0000', tz='UTC'), pd.Timestamp('2020-01-01 8:30:00+0000', tz='UTC'), pd.Timestamp('2020-01-01 9:30:00+0000', tz='UTC')] )

new_df = df.reindex(new_index, method="ffill", tolerance=pd.Timedelta("1 hour"))

Problem description

The following exception is raised when method is "ffill" and "bfill" but not "nearest" (see #32740) AND tolerance is specified

TypeError: DatetimeArray subtraction must have the same timezones or no timezones

I found the timezone was dropped when reaching this function on lines 3024 and 3036

target_values = target._get_engine_target()
if self.is_monotonic_increasing and target.is_monotonic_increasing:
engine_method = (
self._engine.get_pad_indexer
if method == "pad"
else self._engine.get_backfill_indexer
)
indexer = engine_method(target_values, limit)
else:
indexer = self._get_fill_indexer_searchsorted(target, method, limit)
if tolerance is not None:
indexer = self._filter_indexer_tolerance(target_values, indexer, tolerance)

where target is the target index that's tz-aware. However once converted to target_values, the tz info disappears from the numpy array.

I found a working solution but unsure if this behavior affects any other parts functionalities

- self._filter_indexer_tolerance(target_values, indexer, tolerance)
+ self._filter_indexer_tolerance(target, indexer, tolerance)

Expected Output

                           value
2020-01-01 05:30:00+00:00    0.0
2020-01-01 06:30:00+00:00    1.0
2020-01-01 07:30:00+00:00    2.0
2020-01-01 08:30:00+00:00    3.0
2020-01-01 09:30:00+00:00    NaN

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : b5958ee1999e9aead1938c0bba2b674378807b3d
python           : 3.7.6.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.19.0-11-cloud-amd64
Version          : #1 SMP Debian 4.19.146-1 (2020-09-17)
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : C.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.5
numpy            : 1.19.2
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : 6.1.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
jinja2           : 2.11.2
IPython          : 7.18.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.4
fastparquet      : 0.4.1
gcsfs            : 0.7.1
matplotlib       : 3.3.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 1.0.1
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.3
sqlalchemy       : 1.3.20
tables           : None
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : None
numba            : 0.51.2