BUG: rank raises error with read-only data · Issue #37290 · pandas-dev/pandas (original) (raw)


Code Sample, a copy-pastable example

import pandas as pd import numpy as np arr = np.arange(10) arr.setflags(write=False) pd.Series(arr).rank()

Output:

ValueError                                Traceback (most recent call last)
<ipython-input-5-afa6b4ecf509> in <module>
      3 arr = np.arange(10)
      4 arr.setflags(write=False)
----> 5 pd.Series(arr).rank()

~/anaconda/envs/xfactor/lib/python3.8/site-packages/pandas/core/generic.py in rank(self, axis, method, numeric_only, na_option, ascending, pct)
   8334         if numeric_only is None:
   8335             try:
-> 8336                 return ranker(self)
   8337             except TypeError:
   8338                 numeric_only = True

~/anaconda/envs/xfactor/lib/python3.8/site-packages/pandas/core/generic.py in ranker(data)
   8319 
   8320         def ranker(data):
-> 8321             ranks = algos.rank(
   8322                 data.values,
   8323                 axis=axis,

~/anaconda/envs/xfactor/lib/python3.8/site-packages/pandas/core/algorithms.py in rank(values, axis, method, na_option, ascending, pct)
    934     if values.ndim == 1:
    935         values = _get_values_for_rank(values)
--> 936         ranks = algos.rank_1d(
    937             values,
    938             ties_method=method,

pandas/_libs/algos.pyx in pandas._libs.algos.rank_1d()

~/anaconda/envs/xfactor/lib/python3.8/site-packages/pandas/_libs/algos.cpython-38-darwin.so in View.MemoryView.memoryview_cwrapper()

~/anaconda/envs/xfactor/lib/python3.8/site-packages/pandas/_libs/algos.cpython-38-darwin.so in View.MemoryView.memoryview.__cinit__()

ValueError: buffer source array is read-only

Problem description

rank should work with read-only data.

I noticed the problem when using check_estimator from sklearn.utils.estimator_checks on an estimator that uses pandas rank. I haven't explored fully but I assume check_estimator uses read-only data for running its tests, which causes this error.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.3
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.6.0.post20200814
Cython : None
pytest : 6.0.2
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None