read_csv does not parse in header with BOM utf-8 · Issue #4793 · pandas-dev/pandas (original) (raw)

I am using Pandas version 0.12.0 on a Mac.

I noticed that when there is a BOM utf-8 file, and if the header row is in the first line, the read_csv() method will leave a leading quotation mark in the first column's name. However, if the header row is further down the file and I use the "header=" option, then the whole header row gets parsed correctly.

Here is an example code:

bing_kw = pd.read_csv('../../data/sem/Bing-Keyword_daily.csv', header=9, thousands=',', encoding='utf-8')

Parses the header correctly.

bing_kw = pd.read_csv('../../data/sem/Bing-Keyword_daily.csv', thousands=',', encoding='utf-8')

Parses the first header column name incorrectly by leaving the leading quotation mark.