BUG/API: concat with empty DataFrames or all-NA columns by jbrockmendel · Pull Request #43507 · pandas-dev/pandas (original) (raw)

  1. If you don't do the reindex and just do pd.concat([df1, df2]) then you still get dt64 on master.

There can be good reasons for doing a reindex. For example, to determine the exact output columns: in the past, concat did have a join_axes keyword for this, but this was deprecated, pointing the user to use reindex instead.

2. if instead of reindexing you did df2["b"] = pd.Series([np.nan], dtype=np.float64, name="really specifically float64"), wouldn't you want pd.concat([df1, df2]) to not preserve dt64?

Yes, but that's the unfortunate consequence of using float64 as the empty dtype (which we are actually deprecating, so once that is gone, I think we can also restrict the special case to object dtype)

3. If you don't like this behavior, why the frak did you implement it for ArrayManager?

You mean that I did not implement the special case for ArrayManager?

First, I never said that I like this behaviour. I think I already said regularly that long term we should try to get rid of this value-dependent behaviour. I only said above that I disagree with doing this breaking change now.

Personally I think it could be fine for ArrayManager to have different behaviour (but I know we have a different opinion here). But also, the AM implementation was not necessarily intended as final. Note the "# TODO(ArrayManager) decide on exact casting rules in concat" that was inside the code. Eg if we don't get a solution for the empty dtype, IMO we might need to special case at least object dtype also in the AM code path.