Read CSV using c engine silently swallows useful exceptions · Issue #13652 · pandas-dev/pandas (original) (raw)


INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-55-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.1
nose: 1.3.7
Cython: 0.24.0a0
numpy: 1.9.2
scipy: 0.16.0
statsmodels: None
IPython: 4.1.2
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: None
numexpr: 2.4.3
matplotlib: None
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Pandas version: 0.16.1

Python version: sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)

Showing stream error on read

Traceback (most recent call last):
  File "pandas_bug.py", line 34, in <module>
    data = stream.read()
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/codecs.py", line 798, in read
    data = self.reader.read(size)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/codecs.py", line 497, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 12: invalid start byte
Showing stream error on read_csv (python engine)

Traceback (most recent call last):
  File "pandas_bug.py", line 41, in <module>
    stream = test_pandas(True)
  File "pandas_bug.py", line 28, in test_pandas
    df = pandas.read_csv(stream, engine=engine)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in __init__
    self._make_engine(self.engine)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 711, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 1427, in __init__
    self.columns, self.num_original_columns = self._infer_columns()
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 1642, in _infer_columns
    line = self._buffered_line()
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 1769, in _buffered_line
    return self._next_line()
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 1800, in _next_line
    orig_line = next(self.data)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/codecs.py", line 820, in __next__
    data = next(self.reader)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/codecs.py", line 638, in __next__
    line = self.readline()
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/codecs.py", line 551, in readline
    data = self.read(readsize, firstline=True)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/codecs.py", line 497, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte
Showing missing stream error on read_csv (python c)

Traceback (most recent call last):
  File "pandas_bug.py", line 48, in <module>
    stream = test_pandas(False)
  File "pandas_bug.py", line 28, in test_pandas
    df = pandas.read_csv(stream, engine=engine)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in __init__
    self._make_engine(self.engine)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/brett/.virtualenvs/datasets-service/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 509, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)
  File "pandas/parser.pyx", line 635, in pandas.parser.TextReader._get_header (pandas/parser.c:6244)
  File "pandas/parser.pyx", line 831, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8275)
  File "pandas/parser.pyx", line 1742, in pandas.parser.raise_parser_error (pandas/parser.c:20691)
pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

The C engine should behave like python engine. This should be possible by using PyErr_Occurred .