Incorrect match for pd.Term with Categorical at read_hdf · Issue #11304 · pandas-dev/pandas (original) (raw)

In below screenshot, I am scanning the database for a categorical called classification_id with the value 50ef44b795e6e42cd2000001 but I am getting a data-row where the categorical has the value 50ef44b795e6e42cd6000001`.

How is this possible? Note that my list of categorical is huge, more than 4 million entries, with 12 million total rows. (Yes, on average, each classification_id appears 3 times.)

On a side note: The display of this one row of numpy in the line with .values at the end takes a lot of time, possibly due to large size of the Categorical, can that be avoided somehow?

Here's my required meta-data for the bug report:
pandas Version: 0.17.0

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.3.2
Cython: None
numpy: 1.10.0
scipy: 0.16.0
statsmodels: None
IPython: 4.1.0-dev
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: None