BUG/API: can't pass parameters to csv module via df.to_csv (original) (raw)

Trying to print a data frame as plain, strict tsv (i.e., no quoting and no escaping, because I know none the fields will contain tabs), I wanted to use the "quoting" option, which is documented in pandas and is passed through to csv, as well as the "quotechar" option, not documented in pandas but also a csv option. But it doesn't work:

In [1]: import sys, csv

In [2]: from pandas import DataFrame

In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }

In [4]: df = DataFrame(data)

In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None) col1 col2 0 contents of col1 row1 contents of col2 row1

Error Traceback (most recent call last) in () ----> 1 df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/frame.pyc in to_csv(self, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, nanRep, encoding, quoting, line_terminator, chunksize, tupleize_cols, **kwds) 1409 tupleize_cols=tupleize_cols, 1410 ) -> 1411 formatter.save() 1412 1413 def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in save(self) 974 975 else: --> 976 self._save() 977 978

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save(self) 1080 break 1081 -> 1082 self._save_chunk(start_i, end_i) 1083 1084 def _save_chunk(self, start_i, end_i):

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save_chunk(self, start_i, end_i) 1098 ix = data_index.to_native_types(slicer=slicer, na_rep=self.na_rep, float_format=self.float_format) 1099 -> 1100 lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer) 1101 1102 # from collections import namedtuple

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/lib.so in pandas.lib.write_csv_rows (pandas/lib.c:13871)()

Error: need to escape, but no escapechar set

Adding the parameter

quotechar=kwds.get("quotechar")

to the

formatter = fmt.CSVFormatter(...

call in to_csv(), and doing corresponding changes to format.CSVFormatter()'s init() and save(), produces the expected output:

In [1]: import sys, csv

In [2]: from pandas import DataFrame

In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }

In [4]: df = DataFrame(data)

In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None) col1 col2 0 contents of col1 row1 contents of col2 row1 1 contents " of col1 row2 contents " of col2 row2

i.e., unescaped, unquoted tsv.

More generally, there could be many reasons to want more control of the underlying csv writer, so a generic mechanism (as opposed to adding each param one by one) might be called for (e.g., allowign for a csv dialect object or at least a dictionary holding dialect attributes).