pandas.concat doesn't accept a deque · Issue #8645 · pandas-dev/pandas (original) (raw)

The pandas.concat routine's behaviour is slightly at odds with its documentation (and error messages). The documentation states that the first parameter (objs) accepts "list or dict of Series, DataFrame, ..." but the routine is rather more forgiving and appears to accept lists, tuples, dicts, and generator objects (very useful!); hence my impression (during usage) was that it accepted "iterables" generally. Unfortunately it turns out this isn't the case; attempting to concatenate a deque of DataFrame objects results in the following:

from collections import deque import pandas as pd

df = pd.DataFrame.from_dict({'a': [1, 2, 3], 'b': [4, 5, 6]}) d = deque((df, df, df)) pd.concat(d)

----> 1 pd.concat(d)

/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    720                        keys=keys, levels=levels, names=names,
    721                        verify_integrity=verify_integrity,
--> 722                        copy=copy)
    723     return op.get_result()
    724 

/home/dave/arcticenv/lib/python3.4/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    735             raise TypeError('first argument must be a list-like of pandas '
    736                             'objects, you passed an object of type '
--> 737                             '"{0}"'.format(type(objs).__name__))
    738 
    739         if join == 'outer':

TypeError: first argument must be a list-like of pandas objects, you passed an object of type "deque"

The error message states that the first argument "must be a list-like of pandas objects" (which is slightly different to the documentation, and closer to the actual implementation's behaviour). Given that a deque is iterable but not indexable (similar to a generator expression) it seems to fulfil the criteria of being "list-like".

Digging into the implementation, the test that's failing appears to be the first line of _Concatenator.__init__ in pandas.tools.merge which reads as follows (in my installation):

    if not isinstance(objs, (list,tuple,types.GeneratorType,dict,TextFileReader)):
        raise TypeError('first argument must be a list-like of pandas '
                        'objects, you passed an object of type '
                        '"{0}"'.format(type(objs).__name__))

So it appears the actual set of iterables accepted by pandas.concat is lists, tuples, generator expressions, dicts, and instances of TextFileReader. I suggest that it might be better to check for (and act upon) special cases and otherwise assume that objs is a suitable iterable of DataFrame objects. In other words, get rid of that check entirely, and add a couple of checks for "expected" special cases (such as a user mistakenly passing a DataFrame as objs; there's already a check in place for dicts a bit further on).

The conversion of objs to a list-comprehension later on (below if keys is None) should raise a TypeError in the case that it isn't iterable so the change shouldn't cause much impact (i.e. in the case of a non-iterable passed as objs, it'll raise an exception of the same class as the existing code).

If this sounds reasonable, I'm happy to provide a pull request?