BUG: read_html does not parse correctly the header of non-string columns · Issue #5048 · pandas-dev/pandas (original) (raw)
Navigation Menu
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Description
I presume that the problem is that the data is first parsed and then the header is selected out. But when the dtype of the column is a number type the item that should become the column name, since it's not a valid number, becomes NaN
.
Sample data:
data1 = io.StringIO(u'''
Country | Municipality | Year |
---|---|---|
Ukraine | Odessa | 1944 |
Country | Municipality | Year |
---|---|---|
Ukraine | Odessa | 1944 |
Output:
pd.read_html(data1)[0] Country Municipality Year 0 Ukraine Odessa 1944 pd.read_html(data2, header=0)[0] 0 Country Municipality NaN 1 Ukraine Odessa 1944