Bug with read_table, skiprows, and C engine · Issue #8679 · pandas-dev/pandas (original) (raw)
I'm reading the file available at ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2\_mm\_mlo.txt. The data start on line 73.
If I use the default C engine with read_table
I have to specify skiprows=85
to properly load the table:
pd.read_table( 'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=85, engine='c', names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])
But if I use the Python engine then the expected skiprows=72
works:
pd.read_table( 'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=72, engine='python', names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])
The resulting DataFrame is expected to have 679 rows, but has 691 rows and data from the header if I use skiprows=72
with the C engine.
I've confirmed this behavior on Mac OS X Yosemite with Pandas 0.15.0 and a checkout of master@5cf3d85a7d4c448519fa08f918a114209cfbdf2b.