Series.isin fails (errors) for categoricals · Issue #16639 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd import numpy as np #%% print(pd.version) vals = np.array([0, 1,2, 0]); cats = ['a', 'b', 'c'];
DFtrades = pd.DataFrame({'id': pd.Series(pd.Categorical(1).from_codes(vals, cats))}); DFscores = pd.DataFrame({'id': pd.Series(pd.Categorical(1).from_codes(np.array([0, 1]), cats))});
print(DFtrades) print(DFscores)
select_ids = DFtrades['id'].isin(DFscores['id']);
Problem description
I get an error in 0.20.1
File "", line 12, in
select_ids = DFtrades['id'].isin(DFscores['id']);
File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)
File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 421, in isin
return f(comps, values)
File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 399, in
f = lambda x, y: htable.ismember_object(x, values)
File "pandas_libs\hashtable_func_helper.pxi", line 428, in pandas._libs.hashtable.ismember_object (pandas_libs\hashtable.c:29677)
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
Expected Output
a boolean array (or series?) indicating the third row of DFtrades is not in DFscores but the other three are
for reference, this worked (I did not get an error) in 0.19.(something)
also this code will work as expected:
select_ids = DFtrades['id'].isin(DFscores['id'].values);
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
xarray: 0.9.5
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None