BUG: DataFrame.shift shows different behavior for axis=1 when freq is specified · Issue #47039 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

import numpy as np, pandas as pd

rs = np.random.RandomState(0)

raw = pd.DataFrame( rs.randint(1000, size=(10, 8)), columns=["col" + str(i + 1) for i in range(8)] ) raw.index = pd.date_range("2020-1-1", periods=10) raw.columns = pd.date_range("2020-3-1", periods=8)

print(raw) """result is 2020-03-01 2020-03-02 2020-03-03 2020-03-04 2020-03-05 2020-03-06 2020-03-07 2020-03-08 2020-01-01 684 559 629 192 835 763 707 359 2020-01-02 9 723 277 754 804 599 70 472 2020-01-03 600 396 314 705 486 551 87 174 2020-01-04 600 849 677 537 845 72 777 916 2020-01-05 115 976 755 709 847 431 448 850 2020-01-06 99 984 177 755 797 659 147 910 2020-01-07 423 288 961 265 697 639 544 543 2020-01-08 714 244 151 675 510 459 882 183 2020-01-09 28 802 128 128 932 53 901 550 2020-01-10 488 756 273 335 388 617 42 442 """

print(raw.shift(periods=2, freq="D", axis=1)) """result is 2020-03-01 2020-03-02 2020-03-03 2020-03-04 2020-03-05 2020-03-06 2020-03-07 2020-03-08 2020-01-01 NaN NaN 684 559 629 192 835 763 2020-01-02 NaN NaN 9 723 277 754 804 599 2020-01-03 NaN NaN 600 396 314 705 486 551 2020-01-04 NaN NaN 600 849 677 537 845 72 2020-01-05 NaN NaN 115 976 755 709 847 431 2020-01-06 NaN NaN 99 984 177 755 797 659 2020-01-07 NaN NaN 423 288 961 265 697 639 2020-01-08 NaN NaN 714 244 151 675 510 459 2020-01-09 NaN NaN 28 802 128 128 932 53 2020-01-10 NaN NaN 488 756 273 335 388 617 """

print(raw.shift(periods=2, freq="D", axis=1, fill_value=0)) """result is 2020-03-03 2020-03-04 2020-03-05 2020-03-06 2020-03-07 2020-03-08 2020-03-09 2020-03-10 2020-01-01 684 559 629 192 835 763 707 359 2020-01-02 9 723 277 754 804 599 70 472 2020-01-03 600 396 314 705 486 551 87 174 2020-01-04 600 849 677 537 845 72 777 916 2020-01-05 115 976 755 709 847 431 448 850 2020-01-06 99 984 177 755 797 659 147 910 2020-01-07 423 288 961 265 697 639 544 543 2020-01-08 714 244 151 675 510 459 882 183 2020-01-09 28 802 128 128 932 53 901 550 2020-01-10 488 756 273 335 388 617 42 442 """

Issue Description

As is defined in doc, when freq is specified, only values in axes are changed, while data are kept. When axis=0, things work well. However, when axis=1, data are shifted as if freq is not specified. The strange thing is that when fill_value is supplied with a value, freq argument worked. This behavior is not documented in https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.shift.html.

Expected Behavior

shift shows the same behavior for both axis values whenever fill_value is supplied.

As the change might break existing behaviors, it is better to correct this behavior since 1.5.0.

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.9.12.final.0
python-bits : 64
OS : Darwin
OS-release : 21.4.0
Version : Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : zh_CN.UTF-8
pandas : 1.4.2
numpy : 1.21.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 61.2.0
Cython : 0.29.24
pytest : 6.2.4
hypothesis : 6.46.5
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.4
brotli :
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.5.0
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
snappy : None
sqlalchemy : 1.4.27
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None