Concatenation of DFs with all NaT columns and TZ-aware ones breaks · Issue #12396 · pandas-dev/pandas (original) (raw)
Concatenating DFs that have columns with all NaTs and TZ-aware ones breaks as of 0.17.1:
In [1]: import pandas as pd
In [2]: df1 = pd.DataFrame([[pd.NaT], [pd.NaT]])
In [3]: df1
Out[2]:
0
0 NaT
1 NaT
In [4]: df2 = pd.DataFrame([[pd.Timestamp('2015/01/01', tz='UTC')], [pd.Timestamp('2016/01/01', tz='UTC')]])
In [5]: df2
Out[4]:
0
0 2015-01-01 00:00:00+00:00
1 2016-01-01 00:00:00+00:00
In [6]: pd.concat([df1, df2])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-f61a1ab4009e> in <module>()
----> 1 pd.concat([df1, df2])
/.../env/local/lib/python2.7/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
811 verify_integrity=verify_integrity,
812 copy=copy)
--> 813 return op.get_result()
814
815
/.../env/local/lib/python2.7/site-packages/pandas/tools/merge.py in get_result(self)
993
994 new_data = concatenate_block_managers(
--> 995 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)
996 if not self.copy:
997 new_data._consolidate_inplace()
/.../env/local/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
4454 copy=copy),
4455 placement=placement)
-> 4456 for placement, join_units in concat_plan]
4457
4458 return BlockManager(blocks, axes)
/.../env/local/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
4551 to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
4552 upcasted_na=upcasted_na)
-> 4553 for ju in join_units]
4554
4555 if len(to_concat) == 1:
/.../env/local/lib/python2.7/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
4799
4800 if self.is_null and not getattr(self.block,'is_categorical',None):
-> 4801 missing_arr = np.empty(self.shape, dtype=empty_dtype)
4802 if np.prod(self.shape):
4803 # NumPy 1.6 workaround: this statement gets strange if all
TypeError: data type not understood
Possibly related to #11693, #11705 and the #11456 family. However, this doesn't appear to be caused by the TZ-aware vs. non-TZ aware problems referenced there.
Versions:
In [4]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: 1.3.6
pip: 6.0.8
setuptools: 12.0.5
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1.post1
IPython: 3.1.0
sphinx: 1.3.1
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.4
blosc: 1.2.8
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: None
openpyxl: None
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None