read_csv fails for UTF-16 with BOM (maybe also other encodings with BOM) and skiprows · Issue #2298 · pandas-dev/pandas (original) (raw)
��Name Ad performance report
Type Ad
Frequency One time
Date range Custom date range
Dates Sep 19, 2012-Nov 19, 2012
Account Day Campaign Ad group Ad ID Client name Destination URL Impressions Clicks Cost Avg. position Status Conv. (1-per-click)
Categories 2 15.11.2012 something: ��;�C�7�:�8� [somethinglse]{test}: ��;�C�7�:�8� 16902484818 Categories 2 http://www.someurl?ad=291012 333 2 4.7 5.5 approved 0
I guess that the beginning of the file is the BOM and that this causes problems when skipping the rows. Without skiprows everything gets read into one row with the first column containing the BOM.
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns:
��Name\tAd performance report\t\t\t\t\t\t\t\t\t\t\t
...
pd.read_csv('/home/arthur/Desktop/client 139 - ads report/test_pandas.csv', sep='\t', skiprows=5)
/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev_b8dae94-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, header, index_col, names, skiprows, skipfooter, skip_footer, na_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)
361 buffer_lines=buffer_lines)
362
--> 363 return _read(filepath_or_buffer, kwds)
364
365 parser_f.__name__ = name
/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev_b8dae94-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
185
186 # Create the parser.
--> 187 parser = TextFileReader(filepath_or_buffer, **kwds)
188
189 if nrows is not None:
/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev_b8dae94-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
465 self.options, self.engine = self._clean_options(options, engine)
466
--> 467 self._make_engine(self.engine)
468
469 def _get_options_with_defaults(self, engine):
/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev_b8dae94-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in _make_engine(self, engine)
567 def _make_engine(self, engine='c'):
568 if engine == 'c':
--> 569 self._engine = CParserWrapper(self.f, **self.options)
570 else:
571 if engine == 'python':
/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev_b8dae94-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in __init__(self, src, **kwds)
787 ParserBase.__init__(self, kwds)
788
--> 789 self._reader = _parser.TextReader(src, **kwds)
790
791 # XXX
/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev_b8dae94-py2.7-linux-x86_64.egg/pandas/_parser.so in pandas._parser.TextReader.__cinit__ (pandas/src/parser.c:3579)()
/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev_b8dae94-py2.7-linux-x86_64.egg/pandas/_parser.so in pandas._parser.TextReader._get_header (pandas/src/parser.c:4590)()
CParserError: Passed header=0 but only 0 lines in file