SparseArray not in arrays module - inconsistent with IntegerArray, StringArray, etc. · Issue #30642 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: pd.version Out[2]: '0.26.0.dev0+1563.g1feefc692'
In [3]: pd.SparseArray Out[3]: pandas.core.arrays.sparse.array.SparseArray
In [4]: pd.arrays.SparseArray Out[4]: pandas.core.arrays.sparse.array.SparseArray
In [5]: pd.arrays.IntegerArray Out[5]: pandas.core.arrays.integer.IntegerArray
In [6]: pd.IntegerArray
AttributeError Traceback (most recent call last) in ----> 1 pd.IntegerArray
C:\Code\pandas_dev\pandas\pandas_init_.py in getattr(name) 246 return type(name, (), {}) 247 --> 248 raise AttributeError(f"module 'pandas' has no attribute '{name}'") 249 250
AttributeError: module 'pandas' has no attribute 'IntegerArray'
In [7]: pd.arrays.StringArray Out[7]: pandas.core.arrays.string_.StringArray
In [8]: pd.StringArray
AttributeError Traceback (most recent call last) in ----> 1 pd.StringArray
C:\Code\pandas_dev\pandas\pandas_init_.py in getattr(name) 246 return type(name, (), {}) 247 --> 248 raise AttributeError(f"module 'pandas' has no attribute '{name}'") 249 250
AttributeError: module 'pandas' has no attribute 'StringArray'
Problem description
I discovered this while working on #30628 . The docs for SparseArray
are at the top level (https://dev.pandas.io/docs/reference/api/pandas.SparseArray.html), while the docs for IntegerArray
(https://dev.pandas.io/docs/reference/api/pandas.arrays.IntegerArray.html), StringArray
(https://dev.pandas.io/docs/reference/api/pandas.arrays.StringArray.html), etc. are at the pandas.arrays
level.
In the code SparseArray
is at both levels, but IntegerArray
, StringArray
, etc. is only at the arrays
level.
Expected Output
Unsure.
It seems that this should be consistent. Options are:
- Put all
*Array
classes at top level, and document them that way. (i.e., use the pattern currently used forSparseArray
). That would involve code and documentation changes for all of the arrays exceptSparseArray
. - Put all
*Array
classes at both levels (likeSparseArray
), but document them at thepandas.arrays
level (likeIntegerArray
andStringArray
). That would involve code changes for all of the arrays, and doc changes forSparseArray
. - Put all
*Array
classes only at thepandas.arrays
level and document them all that way. That would involve only changing code and docs forSparseArray
and leaving the others alone.
It's not clear to me which is preferred.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 1feefc6
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.26.0.dev0+1563.g1feefc692
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : 2.3.0
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.2
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.11
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6
numba : 0.46.0