qcut raising TypeError for boolean Series · Issue #20303 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd pd.qcut(pd.Series([True, False, False, False, False, False, True]), 6, duplicates="drop", precision=2)

Problem description

Pandas throws a TypeError:

Traceback (most recent call last):
  File "/tmp/pandas/env/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 52, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/pandas/env/src/pandas/pandas/core/reshape/tile.py", line 210, in qcut
    dtype=dtype, duplicates=duplicates)
  File "/tmp/pandas/env/src/pandas/pandas/core/reshape/tile.py", line 254, in _bins_to_cuts
    dtype=dtype)
  File "/tmp/pandas/env/src/pandas/pandas/core/reshape/tile.py", line 351, in _format_labels
    precision = _infer_precision(precision, bins)
  File "/tmp/pandas/env/src/pandas/pandas/core/reshape/tile.py", line 429, in _infer_precision
    levels = [_round_frac(b, precision) for b in bins]
  File "/tmp/pandas/env/src/pandas/pandas/core/reshape/tile.py", line 429, in <listcomp>
    levels = [_round_frac(b, precision) for b in bins]
  File "/tmp/pandas/env/src/pandas/pandas/core/reshape/tile.py", line 422, in _round_frac
    return np.around(x, digits)
  File "/tmp/pandas/env/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 2837, in around
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
  File "/tmp/pandas/env/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 62, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/tmp/pandas/env/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 42, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

If the second parameter for qcut is changed from 6 to 7, a different TypeError is raised:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/pandas/env/src/pandas/pandas/core/reshape/tile.py", line 207, in qcut
    bins = algos.quantile(x, quantiles)
  File "/tmp/pandas/env/src/pandas/pandas/core/algorithms.py", line 903, in quantile
    return algos.arrmap_float64(q, _get_score)
  File "pandas/_libs/algos_common_helper.pxi", line 416, in pandas._libs.algos.arrmap_float64
  File "/tmp/pandas/env/src/pandas/pandas/core/algorithms.py", line 888, in _get_score
    idx % 1)
  File "/tmp/pandas/env/src/pandas/pandas/core/algorithms.py", line 876, in _interpolate
    return a + (b - a) * fraction
TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

Expected Output

Something like

0      (0.29, 1.0]
1    (-0.01, 0.29]
2    (-0.01, 0.29]
3    (-0.01, 0.29]
4    (-0.01, 0.29]
5    (-0.01, 0.29]
6      (0.29, 1.0]
dtype: category
Categories (2, interval[float64]): [(-0.01, 0.29] < (0.29, 1.0]]

Output of pd.show_versions()

Using Pandas 0.23.0.dev0+516.g74e6c78, also reproducable with 0.22.0. In Pandas 0.20.3, the first TypeError is also reproducable, but the second command (with 7 instead of 6) works.

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8

pandas: 0.23.0.dev0+516.g74e6c78
pytest: None
pip: 9.0.1
setuptools: 38.5.2
Cython: 0.27.3
numpy: 1.14.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None