join
on tz-aware column and tz-aware index wrongly fails (regression from 0.22) · Issue #23931 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
df1 = pd.DataFrame( { 'date': pd.date_range(start='2018-01-01', periods=5,tz='America/Chicago'), 'vals':list('abcde') } )
df2 = pd.DataFrame( { 'date': pd.date_range(start='2018-01-03',periods=5, tz='America/Chicago'), 'vals_2': list('tuvwx') } ) joined = df1.join(df2.set_index('date'),on='date')
Problem description
In 0.23.4, this raises:ValueError: You are trying to merge on datetime64[ns, America/Chicago] and datetime64[ns] columns. If you wish to proceed you should use pd.concat
This is incorrect - the two columns are both tz-aware and indeed df2.set_index('date').index.dtype == df1.date.dtype
returns True.
In 0.22.0, this does not raise, and returns the expected output. Haven't run git bisect yet but seems like the logical next step.
Expected Output
0 2018-01-01 00:00:00-06:00 a NaN
1 2018-01-02 00:00:00-06:00 b NaN
2 2018-01-03 00:00:00-06:00 c t
3 2018-01-04 00:00:00-06:00 d u
4 2018-01-05 00:00:00-06:00 e v
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1072-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 39.2.0
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None