BUG: Shift on DataFrame which has more than 1 block creates wrong result · Issue #35488 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Your code here
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df1 = pd.DataFrame(np.random.randint(1000, size=(5, 3)))
In [4]: df2 = pd.DataFrame(np.random.randint(1000, size=(5, 2)))
In [5]: df3 = pd.concat([df1, df2], axis=1)
In [6]: df3
Out[6]:
0 1 2 0 1
0 61 536 154 766 179
1 484 18 15 787 766
2 391 171 715 836 654
3 914 969 765 824 950
4 169 414 759 16 666
In [7]: len(df3._data.blocks)
Out[7]: 2
In [9]: df3.shift(2, axis=1)
Out[9]:
0 1 2 0 1
0 NaN NaN 61.0 NaN NaN
1 NaN NaN 484.0 NaN NaN
2 NaN NaN 391.0 NaN NaN
3 NaN NaN 914.0 NaN NaN
4 NaN NaN 169.0 NaN NaN
Problem description
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
I guess shift
is applied to both of the internal blocks.
Expected Output
I forced consolidate, the result is right.
In [12]: df3._data._consolidate_inplace()
In [13]: df3.shift(2, axis=1)
Out[13]:
0 1 2 0 1
0 NaN NaN 61.0 536.0 154.0
1 NaN NaN 484.0 18.0 15.0
2 NaN NaN 391.0 171.0 715.0
3 NaN NaN 914.0 969.0 765.0
4 NaN NaN 169.0 414.0 759.0
Output of pd.show_versions()
In [14]: pd.show_versions()
/Users/qinxuye/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
"Distutils was imported before Setuptools. This usage is discouraged "
ImportError Traceback (most recent call last)
in
----> 1 pd.show_versions()
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in show_versions(as_json)
104 """
105 sys_info = _get_sys_info()
--> 106 deps = _get_dependency_info()
107
108 if as_json:
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in _get_dependency_info()
82 for modname in deps:
83 mod = import_optional_dependency(
---> 84 modname, raise_on_missing=False, on_version="ignore"
85 )
86 result[modname] = _get_version(mod) if mod else None
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in import_optional_dependency(name, extra, raise_on_missing, on_version)
97 minimum_version = VERSIONS.get(name)
98 if minimum_version:
---> 99 version = _get_version(module)
100 if distutils.version.LooseVersion(version) < minimum_version:
101 assert on_version in {"warn", "raise", "ignore"}
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in _get_version(module)
42
43 if version is None:
---> 44 raise ImportError(f"Can't determine version for {module.name}")
45 return version
46
ImportError: Can't determine version for numba