DataFrame.merge error with empty frame and multiple datetime64[ns, UTC] columns · Issue #25014 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

x = pd.DataFrame([ [pd.Timestamp('2018-01-01', tz='UTC'), 4.0, pd.Timestamp('2019-01-01', tz='UTC')] ], columns=['date', 'value', 'date2']) y = x[:0] y.merge(x, on='date')

Traceback (most recent call last): File "/scratch.py", line 8, in z = y.merge(x, on='date') File "/python/lib/python3.6/site-packages/pandas/core/frame.py", line 6877, in merge copy=copy, indicator=indicator, validate=validate) File "/python/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 48, in merge return op.get_result() File "/python/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 560, in get_result concat_axis=0, copy=self.copy) File "/python/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 2061, in concatenate_block_managers concatenate_join_units(join_units, concat_axis, copy=copy), File "/python/lib/python3.6/site-packages/pandas/core/internals/concat.py", line 240, in concatenate_join_units for ju in join_units] File "/python/lib/python3.6/site-packages/pandas/core/internals/concat.py", line 240, in for ju in join_units] File "/python/lib/python3.6/site-packages/pandas/core/internals/concat.py", line 223, in get_reindexed_values fill_value=fill_value) File "/python/lib/python3.6/site-packages/pandas/core/algorithms.py", line 1579, in take_nd return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill) File "/python/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py", line 589, in take fill_value = self._validate_fill_value(fill_value) File "/python/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py", line 656, in _validate_fill_value "Got '{got}'.".format(got=fill_value)) ValueError: 'fill_value' should be a Timestamp. Got '-9223372036854775808'.

If there is no timezone specified it works as expected:

x = pd.DataFrame([ [pd.Timestamp('2018-01-01'), 4.0, pd.Timestamp('2019-01-01')] ], columns=['date', 'value', 'date2']) y = x[:0] y.merge(x, on='date')

Empty DataFrame Columns: [value_x, date2_x, date, value_y, date2_y] Index: []

It also works if there is only one date column:

x = pd.DataFrame([ [pd.Timestamp('2018-01-01', tz='UTC'), 4.0] ], columns=['date', 'value']) y = x[:0] y.merge(x, on='date')

Empty DataFrame Columns: [value_x, date, value_y] Index: []

Problem description

It seems like the issue is that iNaT is being passed as the fill_value rather than NaT.

Expected Output

Empty DataFrame Columns: [value_x, date2_x, date, value_y, date2_y] Index: []

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-43-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.0
bs4: None
html5lib: None
sqlalchemy: 1.2.16
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None