Appending Pandas dataframes in for loop results in ValueError · Issue #13524 · pandas-dev/pandas (original) (raw)

I recently posted this on StackOverflow. It seems to be a bug so I am posting here as well.

I want to generate a dataframe that is created by appended several separate dataframes generated in a for loop. Each individual dataframe consists of a name column, a range of integers and a column identifying a category to which the integer belongs (e.g. quintile 1 to 5). If I generate each dataframe individually and then append one to the other to create a 'master' dataframe then there are no problems. However, when I use a loop to create each individual dataframe then trying to append a dataframe to the master dataframe results in:

ValueError: incompatible categories in categorical concat

A work-around (suggested by jezrael) involved appending each dataframe to a list of dataframes and concatenating them using pd.concat.

I've written a simplified loop to illustrate:

Code Sample, a copy-pastable example if possible

import numpy as np import pandas as pd

Define column names

colNames = ('a','b','c')

Define a dataframe with the required column names

masterDF = pd.DataFrame(columns = colNames)

A list of the group names

names = ['Group1','Group2','Group3']

Create a dataframe for each group

for i in names: tempDF = pd.DataFrame(columns = colNames) tempDF['a'] = np.arange(1,11,1) tempDF['b'] = i tempDF['c'] = pd.cut(np.arange(1,11,1), bins = np.linspace(0,10,6), labels = [1,2,3,4,5]) print(tempDF) print('\n')

# Try to append temporary DF to master DF
masterDF = masterDF.append(tempDF,ignore_index=True)

print(masterDF)

Expected Output

     a       b  c
 0   1  Group1  1
 1   2  Group1  1
 2   3  Group1  2
 3   4  Group1  2
 4   5  Group1  3
 5   6  Group1  3
 6   7  Group1  4
 7   8  Group1  4
 8   9  Group1  5
 9  10  Group1  5
10  11  Group2  1
11  12  Group2  1
12  13  Group2  2
13  14  Group2  2
...
28  29  Group3  5
29  30  Group3  5

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.18.1
nose: None
pip: 1.5.6
setuptools: 20.1.1
Cython: None
numpy: 1.11.0
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.1.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: 0.7.4.None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None