BUG: read_html does not parse correctly the header of non-string columns · Issue #5048 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

@alefnula

Description

@alefnula

I presume that the problem is that the data is first parsed and then the header is selected out. But when the dtype of the column is a number type the item that should become the column name, since it's not a valid number, becomes NaN.

Sample data:

data1 = io.StringIO(u'''

Country Municipality Year
Ukraine Odessa 1944
''') data2 = io.StringIO(u'''
Country Municipality Year
Ukraine Odessa 1944
''')

Output:

pd.read_html(data1)[0] Country Municipality Year 0 Ukraine Odessa 1944 pd.read_html(data2, header=0)[0] 0 Country Municipality NaN 1 Ukraine Odessa 1944