Index is being materialized in pd.concat
when axis=1
· Issue #46675 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this issue exists on the latest version of pandas.
- I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
import pandas as pd s1 = pd.Series(["a", "b", "c"]) s2 = pd.Series(["a", "b"]) s3 = pd.Series(["a", "b", "c", "d"]) s4 = pd.Series([]) :1: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
sort = False join = 'outer' ignore_index = False axis = 1
result = pd.concat( ... [s1, s2, s3, s4], ... sort=sort, ... join=join, ... ignore_index=ignore_index, ... axis=axis, ... )
print(result) 0 1 2 3 0 a a a NaN 1 b b b NaN 2 c NaN c NaN 3 NaN NaN d NaN print(result.index) Int64Index([0, 1, 2, 3], dtype='int64') print(pd.version) 1.4.2
Installed Versions
pd.show_versions()
Traceback (most recent call last):
File "", line 1, in
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
deps = _get_dependency_info()
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
mod = import_optional_dependency(modname, errors="ignore")
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/pandas/compat/_optional.py", line 138, in import_optional_dependency
module = importlib.import_module(name)
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 843, in exec_module
File "", line 219, in _call_with_frames_removed
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/setuptools/init.py", line 8, in
import _distutils_hack.override # noqa: F401
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/_distutils_hack/override.py", line 1, in
import('_distutils_hack').do_override()
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/_distutils_hack/init.py", line 72, in do_override
ensure_local_distutils()
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/_distutils_hack/init.py", line 59, in ensure_local_distutils
assert '_distutils' in core.file, core.file
AssertionError: /nvme/0/pgali/envs/cudfdev/lib/python3.8/distutils/core.py
Prior Performance
import pandas as pd ... s1 = pd.Series(["a", "b", "c"]) ... s2 = pd.Series(["a", "b"]) ... s3 = pd.Series(["a", "b", "c", "d"]) ... s4 = pd.Series([]) ... ... sort = False ... join = 'outer' ... ignore_index = False ... axis = 1 ... ... result = pd.concat( ... [s1, s2, s3, s4], ... sort=sort, ... join=join, ... ignore_index=ignore_index, ... axis=axis, ... ) ... ... print(result) ... print(result.index) ... print(pd.version) ... :5: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. 0 1 2 3 0 a a a NaN 1 b b b NaN 2 c NaN c NaN 3 NaN NaN d NaN RangeIndex(start=0, stop=4, step=1) 1.3.5