BUG: Using pd.concat(axis="columns") on differently sized MultiIndexed DataFrames with a datetime index level containing exclusively NaT values causes the level in the returned DataFrame to be a float instead of a datetime · Issue #44900 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
import pandas as pd
print(pd.version) df_a = pd.DataFrame({"a": range(5), "idx1": range(5), "idx2": [pd.NaT] * 5}).set_index(["idx1", "idx2"]) df_b = pd.DataFrame({"b": range(6), "idx1": range(6), "idx2": [pd.NaT] * 6}).set_index(["idx1", "idx2"]) df = pd.concat([df_a, df_b], axis="columns") df.reset_index().info()
On pandas 1.3.5:
1.3.5 <class 'pandas.core.frame.DataFrame'> RangeIndex: 6 entries, 0 to 5 Data columns (total 4 columns):
Column Non-Null Count Dtype
0 idx1 6 non-null int64
1 idx2 0 non-null float64
2 a 5 non-null float64
3 b 6 non-null int64
dtypes: float64(2), int64(2)
memory usage: 320.0 bytes
On pandas 1.2.5:
1.2.5 <class 'pandas.core.frame.DataFrame'> RangeIndex: 6 entries, 0 to 5 Data columns (total 4 columns):
Column Non-Null Count Dtype
0 idx1 6 non-null int64
1 idx2 0 non-null datetime64[ns]
2 a 5 non-null float64
3 b 6 non-null int64
dtypes: datetime64ns, float64(1), int64(2)
memory usage: 320.0 bytes
1.2.5 has the correct expected output. In 1.3.5, the datetime NaTs index has been converted to float NaNs. The bug only occurs if the DataFrames have different lengths and the index level in question contains only NaTs.