SparseArray not in arrays module - inconsistent with IntegerArray, StringArray, etc. · Issue #30642 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: pd.version Out[2]: '0.26.0.dev0+1563.g1feefc692'

In [3]: pd.SparseArray Out[3]: pandas.core.arrays.sparse.array.SparseArray

In [4]: pd.arrays.SparseArray Out[4]: pandas.core.arrays.sparse.array.SparseArray

In [5]: pd.arrays.IntegerArray Out[5]: pandas.core.arrays.integer.IntegerArray

In [6]: pd.IntegerArray

AttributeError Traceback (most recent call last) in ----> 1 pd.IntegerArray

C:\Code\pandas_dev\pandas\pandas_init_.py in getattr(name) 246 return type(name, (), {}) 247 --> 248 raise AttributeError(f"module 'pandas' has no attribute '{name}'") 249 250

AttributeError: module 'pandas' has no attribute 'IntegerArray'

In [7]: pd.arrays.StringArray Out[7]: pandas.core.arrays.string_.StringArray

In [8]: pd.StringArray

AttributeError Traceback (most recent call last) in ----> 1 pd.StringArray

C:\Code\pandas_dev\pandas\pandas_init_.py in getattr(name) 246 return type(name, (), {}) 247 --> 248 raise AttributeError(f"module 'pandas' has no attribute '{name}'") 249 250

AttributeError: module 'pandas' has no attribute 'StringArray'

Problem description

I discovered this while working on #30628 . The docs for SparseArray are at the top level (https://dev.pandas.io/docs/reference/api/pandas.SparseArray.html), while the docs for IntegerArray (https://dev.pandas.io/docs/reference/api/pandas.arrays.IntegerArray.html), StringArray(https://dev.pandas.io/docs/reference/api/pandas.arrays.StringArray.html), etc. are at the pandas.arrays level.

In the code SparseArray is at both levels, but IntegerArray, StringArray, etc. is only at the arrays level.

Expected Output

Unsure.

It seems that this should be consistent. Options are:

Put all *Array classes at top level, and document them that way. (i.e., use the pattern currently used for SparseArray). That would involve code and documentation changes for all of the arrays except SparseArray.
Put all *Array classes at both levels (like SparseArray), but document them at the pandas.arrays level (like IntegerArray and StringArray). That would involve code changes for all of the arrays, and doc changes for SparseArray.
Put all *Array classes only at the pandas.arrays level and document them all that way. That would involve only changing code and docs for SparseArray and leaving the others alone.

It's not clear to me which is preferred.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 1feefc6
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.26.0.dev0+1563.g1feefc692
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : 2.3.0
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.2
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.11
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6
numba : 0.46.0