BUG: get_indexer methods return int64 instead of intp arrays (original) (raw)


Code Sample, a copy-pastable example

import pandas as pd ax1 = pd.Index([1, 2, 3]) ax2 = pd.Index([1, 1, 2]) ans1 = ax1.get_indexer([1]) ans2 = ax2.get_indexer_non_unique([1]) print(ans1, ans1.dtype) [0] int64 print(ans2[0], ans2[0].dtype, ans2[1], ans2[1].dtype) [0 1] int64 [] int64

Problem description

Found in #35498. When looking at the implementation of the get_indexer or get_indexer_non_unique in pandas/_libs/index.pyx, I noticed that the returned array dtype will always be int64. Since these methods return indices arrays, I believe that intp is a more appropriate type because it will choose a size depending on ssize_t, which is guaranteed to be large enough to represent all possible indices in the array.

Expected Output

[0] intp [0 1] intp [] intp

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : 7b14cf6b0b9dbcddce7b9bb22a81c73bdebc1be8 python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 4.19.76-linuxkit Version : #1 SMP Tue May 26 11:42:35 UTC 2020 machine : x86_64 processor : byteorder : little LC_ALL : C.UTF-8 LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0rc0+406.g7b14cf6b0 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 45.2.0.post20200210 Cython : 0.29.21 pytest : 5.4.3 hypothesis : 5.20.2 sphinx : 3.1.1 blosc : None feather : None xlsxwriter : 1.2.9 lxml.etree : 4.4.1 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.16.1 pandas_datareader: None bs4 : 4.9.1 bottleneck : 1.3.2 fsspec : 0.7.4 fastparquet : 0.4.1 gcsfs : 0.6.2 matplotlib : 3.2.1 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.4 pandas_gbq : None pyarrow : 0.16.0 pytables : None pyxlsb : None s3fs : 0.4.2 scipy : 1.5.1 sqlalchemy : 1.3.18 tables : 3.6.1 tabulate : 0.8.7 xarray : 0.16.0 xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.50.1