pandas.concat doesn't accept a deque · Issue #8645 · pandas-dev/pandas (original) (raw)
The pandas.concat
routine's behaviour is slightly at odds with its documentation (and error messages). The documentation states that the first parameter (objs) accepts "list or dict of Series, DataFrame, ..." but the routine is rather more forgiving and appears to accept lists, tuples, dicts, and generator objects (very useful!); hence my impression (during usage) was that it accepted "iterables" generally. Unfortunately it turns out this isn't the case; attempting to concatenate a deque of DataFrame objects results in the following:
from collections import deque import pandas as pd
df = pd.DataFrame.from_dict({'a': [1, 2, 3], 'b': [4, 5, 6]}) d = deque((df, df, df)) pd.concat(d)
----> 1 pd.concat(d)
/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
720 keys=keys, levels=levels, names=names,
721 verify_integrity=verify_integrity,
--> 722 copy=copy)
723 return op.get_result()
724
/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
735 raise TypeError('first argument must be a list-like of pandas '
736 'objects, you passed an object of type '
--> 737 '"{0}"'.format(type(objs).__name__))
738
739 if join == 'outer':
TypeError: first argument must be a list-like of pandas objects, you passed an object of type "deque"
The error message states that the first argument "must be a list-like of pandas objects" (which is slightly different to the documentation, and closer to the actual implementation's behaviour). Given that a deque is iterable but not indexable (similar to a generator expression) it seems to fulfil the criteria of being "list-like".
Digging into the implementation, the test that's failing appears to be the first line of _Concatenator.__init__
in pandas.tools.merge
which reads as follows (in my installation):
if not isinstance(objs, (list,tuple,types.GeneratorType,dict,TextFileReader)):
raise TypeError('first argument must be a list-like of pandas '
'objects, you passed an object of type '
'"{0}"'.format(type(objs).__name__))
So it appears the actual set of iterables accepted by pandas.concat
is lists, tuples, generator expressions, dicts, and instances of TextFileReader
. I suggest that it might be better to check for (and act upon) special cases and otherwise assume that objs
is a suitable iterable of DataFrame
objects. In other words, get rid of that check entirely, and add a couple of checks for "expected" special cases (such as a user mistakenly passing a DataFrame
as objs; there's already a check in place for dicts a bit further on).
The conversion of objs
to a list-comprehension later on (below if keys is None
) should raise a TypeError
in the case that it isn't iterable so the change shouldn't cause much impact (i.e. in the case of a non-iterable passed as objs
, it'll raise an exception of the same class as the existing code).
If this sounds reasonable, I'm happy to provide a pull request?