BUG: pandas.cut incorrectly raises a ValueError due to an overflow · Issue #26045 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd; pd.version Out[1]: '0.25.0.dev0+389.g6d9b702a66'
In [2]: bins = [pd.Timestamp.min, pd.Timestamp('2018-01-01'), pd.Timestamp.max]
In [3]: values = pd.date_range('2017-12-31', periods=3)
In [4]: pd.cut(values, bins=bins)
ValueError: bins must increase monotonically.
The issue appears to be due to an overflow in np.diff
, which in turn makes it appear that the bins
are not monotonically increasing. Essentially the following:
In [5]: bins_numeric = [b.value for b in bins]
In [6]: bins_numeric Out[6]: [-9223372036854775000, 1514764800000000000, 9223372036854775807]
In [7]: np.diff(bins_numeric) Out[7]: array([-7708607236854776616, 7708607236854775807], dtype=int64)
Problem description
The bins
are monotonically increasing but a ValueError
is raised indicating that they aren't.
Expected Output
I'd expect [4]
to not raise a ValueError
.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: 6d9b702
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.25.0.dev0+389.g6d9b702a66
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.6.3
Cython: 0.28.2
numpy: 1.14.6
scipy: 1.0.0
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml.etree: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: None