BUG/API: can't pass parameters to csv module via df.to_csv · Issue #4528 · pandas-dev/pandas (original) (raw)
Trying to print a data frame as plain, strict tsv (i.e., no quoting and no escaping, because I know none the fields will contain tabs), I wanted to use the "quoting" option, which is documented in pandas and is passed through to csv, as well as the "quotechar" option, not documented in pandas but also a csv option. But it doesn't work:
In [1]: import sys, csv
In [2]: from pandas import DataFrame
In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }
In [4]: df = DataFrame(data)
In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None) col1 col2 0 contents of col1 row1 contents of col2 row1
Error Traceback (most recent call last) in () ----> 1 df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/frame.pyc in to_csv(self, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, nanRep, encoding, quoting, line_terminator, chunksize, tupleize_cols, **kwds) 1409 tupleize_cols=tupleize_cols, 1410 ) -> 1411 formatter.save() 1412 1413 def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in save(self) 974 975 else: --> 976 self._save() 977 978
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save(self) 1080 break 1081 -> 1082 self._save_chunk(start_i, end_i) 1083 1084 def _save_chunk(self, start_i, end_i):
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save_chunk(self, start_i, end_i) 1098 ix = data_index.to_native_types(slicer=slicer, na_rep=self.na_rep, float_format=self.float_format) 1099 -> 1100 lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer) 1101 1102 # from collections import namedtuple
/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/lib.so in pandas.lib.write_csv_rows (pandas/lib.c:13871)()
Error: need to escape, but no escapechar set
Adding the parameter
quotechar=kwds.get("quotechar")
to the
formatter = fmt.CSVFormatter(...
call in to_csv(), and doing corresponding changes to format.CSVFormatter()'s init() and save(), produces the expected output:
In [1]: import sys, csv
In [2]: from pandas import DataFrame
In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }
In [4]: df = DataFrame(data)
In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None) col1 col2 0 contents of col1 row1 contents of col2 row1 1 contents " of col1 row2 contents " of col2 row2
i.e., unescaped, unquoted tsv.
More generally, there could be many reasons to want more control of the underlying csv writer, so a generic mechanism (as opposed to adding each param one by one) might be called for (e.g., allowign for a csv dialect object or at least a dictionary holding dialect attributes).