Suggested improvement: allow ExcelWriter to automatically convert lists and dict to strings · Issue #8188 · pandas-dev/pandas (original) (raw)
Problem description:
I have pandas dataframes with diverse data types and structures, including lists. The csv-writer saves the lists without trouble, but I prefer to also save in excel format for collaboration with my colleagues.
Currently the ExcelWriter returns a TypeError if the dataframe contains a list.TypeError: float() argument must be a string or a number, not 'list'
It would be really great if ExcelWriter accepted lists and dictionaries and tuples.
Would it cause any problem to simply convert these data formats to strings before saving?
Sorry I’m not an experienced programmer, so I haven‘t made a pull request and attempted to improve the module myself.
Here is some code (Python 3.4, pandas 0.14.0) to describe the current situation.
import pandas as pd
#create a new dataframe that includes a list and a dict
title = pd.Series(['Toy Story (1995)',
'Jumanji (1995)',
'Grumpier Old Men (1995)',
'Waiting to Exhale (1995)',
'Father of the Bride Part II (1995)'], index = [0 ,1, 2, 3, 4], name='titles')
genre_as_list = pd.Series([['Animation', "Children's", 'Comedy'],
['Adventure', "Children's", 'Fantasy'],
['Comedy', 'Romance'],
['Comedy', 'Drama'],
['Comedy']], index = [0 ,1, 2, 3, 4], name='genre_as_list')
rating_as_dict = pd.Series([{'F':0.5,'M':0.6},
{'F':0.1,'M':0.2},
{'F':0.3,'M':0.3},
{'F':0.8,'M':0.8},
{'F':0.8,'M':0.6}], index = [0 ,1, 2, 3, 4], name='rating_as_dict')
df = pd.concat([title, genre_as_list, rating_as_dict], axis=1)
attempt to save the dataframe in excel
writer = pd.ExcelWriter('saved_df.xlsx')
df.to_excel(writer, sheet_name='Sheet1')
which will result inTypeError: float() argument must be a string or a number, not 'list'
to save the data, it is necessary to convert the python lists or dicts to strings, separated by ", " or pipes "|"
df['genre_as_string'] = [str(i) for i in df['genre_as_list']]
df['rating_as_string'] = [str(i) for i in df['rating_as_dict']]
df['genre_pipes_string'] = ['|'.join(i) for i in df['genre_as_list']]
#in my ipython output, the difference is not actually visible between the lists and the stringlists
print('first item from list: %s\nfirst item from stringlist: %s' % (df['genre_as_list'][0][0],
df['genre_as_string'][0][0]))
After deleting any columns containing a python list or dict, I can now save without any problem.
df = df.drop(['genre_as_list','rating_as_dict'], axis=1)
writer = pd.ExcelWriter('saved_df.xlsx')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
If I load from the saved excel sheet, I can confirm that the data is intact
df_from_excel = pd.read_excel('saved_df.xlsx', sheetname='Sheet1')
#convert back to original python lists and dict, if desired
df_from_excel['genre_as_list_again'] = [eval(i) for i in df_from_excel['genre_as_string']]
df_from_excel['rating_as_dict_again'] = [eval(i) for i in df_from_excel['rating_as_string']]
df_from_excel.loc[0,'genre_as_list_again'][0]