Inconsistent behavior of *_range functions · Issue #17471 · pandas-dev/pandas (original) (raw)
Currently there are some inconsistencies in behavior between the various *_range
functions: date_range
, period_range
, timedelta_range
, interval_range
.
Note: bdate_range
and cdate_range
largely use the same code as date_range
, so I'm lumping the behavior all three together in the examples below.
End Inclusion
The end
parameter of interval_range
is not included in the resulting IntervalIndex
:
In [2]: pd.interval_range(start=0, end=4) Out[2]: IntervalIndex([(0, 1], (1, 2], (2, 3]] closed='right', dtype='interval[int64]')
However, end
is included in the output of the other *_range
functions:
In [3]: pd.period_range(start='2017Q1', end='2017Q4', freq='Q') Out[3]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4'], dtype='period[Q-DEC]', freq='Q-DEC')
In [4]: pd.date_range(start='2017-01-01', end='2017-01-04') Out[4]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
In [5]: pd.timedelta_range(start='1 day', end='4 days') Out[5]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')
Proposal: interval_range
should include end
as the endpoint of the last interval in the resulting IntervalIndex
.
Behavior when too many parameters are passed
When the start
, end
, and periods
parameters are all passed, interval_range
ignores the periods
parameter:
In [6]: pd.interval_range(start=0, end=4, periods=6) Out[6]: IntervalIndex([(0, 1], (1, 2], (2, 3]] closed='right', dtype='interval[int64]')
However, period_range
ignores the end
parameter:
In [7]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q') Out[7]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1', '2018Q2'], dtype='period[Q-DEC]', freq='Q-DEC')
Both date_range
and timedelta_range
raise:
In [8]: pd.date_range(start='2017-01-01', end='2017-01-04', periods=6) ValueError: Must specify two of start, end, or periods
In [9]: pd.timedelta_range(start='1 day', end='4 days', periods=6) ValueError: Must specify two of start, end, or periods
Proposal: interval_range
and period_range
should raise. Add the word "exactly" to the error message of all for additional clarity: "Must specify exactly two of start, end, or periods".
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None