Concatenation of DFs with all NaT columns and TZ-aware ones breaks · Issue #12396 · pandas-dev/pandas (original) (raw)

Concatenating DFs that have columns with all NaTs and TZ-aware ones breaks as of 0.17.1:

In [1]: import pandas as pd
In [2]: df1 = pd.DataFrame([[pd.NaT], [pd.NaT]])
In [3]: df1
Out[2]:
    0
0   NaT
1   NaT
In [4]: df2 = pd.DataFrame([[pd.Timestamp('2015/01/01', tz='UTC')], [pd.Timestamp('2016/01/01', tz='UTC')]])
In [5]: df2
Out[4]:
    0
0   2015-01-01 00:00:00+00:00
1   2016-01-01 00:00:00+00:00
In [6]: pd.concat([df1, df2])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-f61a1ab4009e> in <module>()
----> 1 pd.concat([df1, df2])

/.../env/local/lib/python2.7/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    811                        verify_integrity=verify_integrity,
    812                        copy=copy)
--> 813     return op.get_result()
    814 
    815 

/.../env/local/lib/python2.7/site-packages/pandas/tools/merge.py in get_result(self)
    993 
    994             new_data = concatenate_block_managers(
--> 995                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)
    996             if not self.copy:
    997                 new_data._consolidate_inplace()

/.../env/local/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4454                                                 copy=copy),
   4455                          placement=placement)
-> 4456               for placement, join_units in concat_plan]
   4457 
   4458     return BlockManager(blocks, axes)

/.../env/local/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
   4551     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4552                                          upcasted_na=upcasted_na)
-> 4553                  for ju in join_units]
   4554 
   4555     if len(to_concat) == 1:

/.../env/local/lib/python2.7/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
   4799 
   4800             if self.is_null and not getattr(self.block,'is_categorical',None):
-> 4801                 missing_arr = np.empty(self.shape, dtype=empty_dtype)
   4802                 if np.prod(self.shape):
   4803                     # NumPy 1.6 workaround: this statement gets strange if all

TypeError: data type not understood

Possibly related to #11693, #11705 and the #11456 family. However, this doesn't appear to be caused by the TZ-aware vs. non-TZ aware problems referenced there.
Versions:

In [4]: pd.show_versions()


INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.6
pip: 6.0.8
setuptools: 12.0.5
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1.post1
IPython: 3.1.0
sphinx: 1.3.1
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.4
blosc: 1.2.8
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: None
openpyxl: None
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None