DEPR: join_axes-kwarg in pd.concat by h-vetinari · Pull Request #22318 · pandas-dev/pandas (original) (raw)
@jorisvandenbossche
I'm answering your comment from #21951 here, as this is further ahead:
"but this will be fixed by 21160 anyway." -> that is long off to be the default integer type, so I don't think we should use that as an argument now
Fair enough, that was a misunderstanding on my part. But the dtype-question is a side-issue here, IMO
"with reindex and reindex_like, it is redundant as well" -> I don't know the implementation, but I would assume that reindexing after the fact can be less performant? (assuming that with
join_axes
it reindexes each input before concatenating)
I posed this question in the OP ("Only question is if performance would be much worse, if concatenating huge Series/DFs before selecting small index-subset."), and it's a valid point. The user could apply a reindex directly to the arguments of concat
(I originally had something to that effect in the FutureWarning
in this PR), but I see that it would be good to keep having a way to do this easily.
It is certainly true the way it is explained and spelled (eg the fact that you need to pass a list) is certainly outdated now Panel is removed. But we could also consider improving it.
In my opinion, the join_axes
-keyword (as a name, not necessarily as functionality) should definitely be deprecated. The question of what to replace it with is very similar to the discussion in #21855 (comment).
There might be a good solution for replacing the join_axes
-functionality with something like the following (conceptually):
concat(objs, axis=0, join='outer', index=None, columns=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)
[...]
if (axis == 0 and index is not None) or (axis == 1 and columns is not None):
raise ValueError('Cannot set index of concatenation axis; '
'can only set index of non-concatenation axis')
index = join if index is None else index
columns = join if columns is None else columns
[...]
objs = [x.reindex(index/columns, axis=other_axis) for x in objs]
[...]
# actual concatenation
Then index the non-concatenation index of the result could be set as desired (essentially replacing join_axes
with a cleaner API, now that Panel
will be deprecated).
(note that I am not married to the keyword, I have never used it myself, but just think we should have a bit more discussion about it. It would be interesting to search for usage on github/SO)
There's not much on SO - https://www.google.com/search?q=stackoverflow+pandas+concat+join_axes yields only one (sorta) relevant hit: https://stackoverflow.com/q/27391081
Neither did I find much of substance here: https://github.com/search?p=2&q=join_axes&type=Issues