Bug with read_table, skiprows, and C engine (original) (raw)

I'm reading the file available at ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2\_mm\_mlo.txt. The data start on line 73.

If I use the default C engine with read_table I have to specify skiprows=85 to properly load the table:

pd.read_table( 'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=85, engine='c', names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])

But if I use the Python engine then the expected skiprows=72 works:

pd.read_table( 'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=72, engine='python', names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])

The resulting DataFrame is expected to have 679 rows, but has 691 rows and data from the header if I use skiprows=72 with the C engine.

I've confirmed this behavior on Mac OS X Yosemite with Pandas 0.15.0 and a checkout of master@5cf3d85a7d4c448519fa08f918a114209cfbdf2b.