BUG: read_html does not parse correctly the header of non-string columns (original) (raw)

I presume that the problem is that the data is first parsed and then the header is selected out. But when the dtype of the column is a number type the item that should become the column name, since it's not a valid number, becomes NaN.

Sample data:

data1 = io.StringIO(u'''

Country	Municipality	Year
Ukraine	Odessa	1944

''') data2 = io.StringIO(u'''

Country	Municipality	Year
Ukraine	Odessa	1944

''')

Output:

pd.read_html(data1)[0] Country Municipality Year 0 Ukraine Odessa 1944 pd.read_html(data2, header=0)[0] 0 Country Municipality NaN 1 Ukraine Odessa 1944