Not possible to concatenate DataFrames with sparse data without copying (original) (raw)

Code Sample

a = pd.DataFrame([0, 0, 1, 0, 2], columns=['a']) a = a.astype(pd.SparseDtype('float', 0.0)) b = pd.DataFrame([0, 2, 3, 0, 2], columns=['b']) b = b.astype(pd.SparseDtype('float', 0.0))

pd.concat([a, b], axis=1, copy=False)

Problem description

Hello! Since SparseArray is no longer a subclass of numpy.ndarray (pandas>=0.24.0) it is not possible to concatenate dataframes with SparseArray dtype using pandas.concat.
The problem is SparseArray has not attribute view which is used in pandas.core.internals.managers.py at function concatenate_block_managers when parameter copy is False.

Actual Output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-d61b58c595ad> in <module>
----> 1 pd.concat([a, b], axis=1, copy=False)

~/.conda/envs/pandas_dev/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    256     )
    257 
--> 258     return op.get_result()
    259 
    260 

~/.conda/envs/pandas_dev/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
    471 
    472             new_data = concatenate_block_managers(
--> 473                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
    474             )
    475             if not self.copy:

~/.conda/envs/pandas_dev/lib/python3.7/site-packages/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   2044                 values = values.copy()
   2045             elif not copy:
-> 2046                 values = values.view()
   2047             b = b.make_block_same_class(values, placement=placement)
   2048         elif is_uniform_join_units(join_units):

AttributeError: 'SparseArray' object has no attribute 'view'

Output of pd.show_versions()

Details

INSTALLED VERSIONS


commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.1.3.el7.x86_64
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.10.2
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.4.0
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None