Appending Pandas dataframes in for loop results in ValueError · Issue #13524 · pandas-dev/pandas (original) (raw)
I recently posted this on StackOverflow. It seems to be a bug so I am posting here as well.
I want to generate a dataframe that is created by appended several separate dataframes generated in a for loop. Each individual dataframe consists of a name column, a range of integers and a column identifying a category to which the integer belongs (e.g. quintile 1 to 5). If I generate each dataframe individually and then append one to the other to create a 'master' dataframe then there are no problems. However, when I use a loop to create each individual dataframe then trying to append a dataframe to the master dataframe results in:
ValueError: incompatible categories in categorical concat
A work-around (suggested by jezrael) involved appending each dataframe to a list of dataframes and concatenating them using pd.concat.
I've written a simplified loop to illustrate:
Code Sample, a copy-pastable example if possible
import numpy as np import pandas as pd
Define column names
colNames = ('a','b','c')
Define a dataframe with the required column names
masterDF = pd.DataFrame(columns = colNames)
A list of the group names
names = ['Group1','Group2','Group3']
Create a dataframe for each group
for i in names: tempDF = pd.DataFrame(columns = colNames) tempDF['a'] = np.arange(1,11,1) tempDF['b'] = i tempDF['c'] = pd.cut(np.arange(1,11,1), bins = np.linspace(0,10,6), labels = [1,2,3,4,5]) print(tempDF) print('\n')
# Try to append temporary DF to master DF
masterDF = masterDF.append(tempDF,ignore_index=True)
print(masterDF)
Expected Output
a b c
0 1 Group1 1
1 2 Group1 1
2 3 Group1 2
3 4 Group1 2
4 5 Group1 3
5 6 Group1 3
6 7 Group1 4
7 8 Group1 4
8 9 Group1 5
9 10 Group1 5
10 11 Group2 1
11 12 Group2 1
12 13 Group2 2
13 14 Group2 2
...
28 29 Group3 5
29 30 Group3 5
output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
pandas: 0.18.1
nose: None
pip: 1.5.6
setuptools: 20.1.1
Cython: None
numpy: 1.11.0
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.1.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: 0.7.4.None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None