read_csv problem with delim_whitespace, skiprows and trailing spaces in skipped rows (original) (raw)

given this input file with linefeeds indicated by

skip1<lf>
skip2<lf>
0    1    2<lf>
3    4    5<lf>

reading with read_csv() in pandas 0.15.0-42-g20be789 and python 3.4.2 works

df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
df
   0  1  2
0  0  1  2
1  3  4  5

If I add a space after skip1 so the skipped lines are

then read_csv() throws an error
CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 3

Adding 1 to skiprows
df = pd.read_csv('test.txt', skiprows=3, delim_whitespace=True, header=None)
does not throw an exception and gives the expected DataFrame

Reading with skiprows=2 and without header=None does not throw an exception and produces a DataFrame with a multiindex

If there is a space after skip2 so the skipped lines are

then
df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
does not throw an exception but it does not include the 0 1 2 row in the DataFrame

If there are spaces after skip1 and skip2 so the skipped lines are

then
df = pd.read_csv('test.txt', skiprows=2, delim_whitespace=True, header=None)
throws the CParserError exception but
df = pd.read_csv('test.txt', skiprows=3, delim_whitespace=True, header=None)
does not and returns the expected DataFrame

I would expect skiprows to skip the number of lines specified whether or not there are trailing spaces in those lines.