SUB-character in a csv causes read_csv() with C-Engine to detect EOF · Issue #16893 · pandas-dev/pandas (original) (raw)

Problem description

If there is a SUB-character in a string in a csv, read_csv() with the standard C-engine returns

ParserError: Error tokenizing data. C error: EOF inside string starting at line 0

The Python-engine can read the file fine.

It seems I can't put example data with a SUB-character here, so I pasted an example line here instead:
https://pastebin.com/x6QPY4Hf
Just paste the line into a csv and try to read it with read_csv().

I don't know if this behaviour is expected or not since this character is indeed used as EOF in certain cases, however I see little sense in having a SUB character interpreted as EOF in the middle of a csv file.

commit: None

python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64

pandas: 0.20.2