BUG: SparseDataFrame constructor issues (original) (raw)
I've noticed a bunch of issues with SparseDataFrame's constructor, particularly around support for non-float64 datatypes:
- when
datais not provided, the default_fill_value is not used to fill the data (noted in passing BUG: SparseDataFrame does not allow single value data #5470):
pd.SparseDataFrame(columns=list("ab"), index=range(4), default_fill_value=0.0) a b 0 NaN NaN 1 NaN NaN 2 NaN NaN 3 NaN NaN
- [ ] when data is not provided and the dtype is set to int64, the constructor call fails
pd.SparseDataFrame(columns=list("ab"), index=range(4), dtype=np.int64) ValueError: cannot convert float NaN to integer
whendatais an empty sparse matrix withint64entries and nodtypeis specified, the type of the SparseDataFrame is stillfloat64. Worse, even whendtypeis specified, it still doesn't work.
pd.SparseDataFrame(coo_matrix((4,3), dtype=np.int64)) 0 1 2 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 3 NaN NaN NaN pd.SparseDataFrame(coo_matrix((4,3), dtype=np.int64), dtype=np.int64) 0 1 2 0 NaN NaN NaN 1 NaN NaN NaN 2 NaN NaN NaN 3 NaN NaN NaN
probably due to the same issue, given a sparse matrix with underlying elements of typeint64, if any of the values in a column is0then that column will be treated asfloat64s:
pd.SparseDataFrame(coo_matrix(np.arange(12).reshape(4,3), dtype=np.int64)) 0 1 2 0 NaN 1 2 1 3.0 4 5 2 6.0 7 8 3 9.0 10 11
(as long as at least one value is non-zero, then this can be worked around by setting default_fill_value; however, if all of the values in one column are zero then this no longer works).
Output of pd.show_versions()
Details
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.2
pytest: 2.9.2
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: 0.4.0