ERR: Fail-fast with incompatible skipfooter combos by gfyoung · Pull Request #23711 · pandas-dev/pandas (original) (raw)

Setup:

from pandas import read_csv from pandas.compat import StringIO

data = "a\n1\n2\n3\n4\n5"

Case 1:

read_csv(StringIO(data), skipfooter=1, nrows=2)

Case 2:

read_csv(StringIO(data), skipfooter=1, chunksize=2)

Currently, we get:

Case 1:

... ValueError: skipfooter not supported for iteration

Case 2:

... <pandas.io.parsers.TextFileReader object at ...>

In Case 1, the error message is not correct. True, passing in nrows and skipfooter is not supported (skipfooter should be skipping lines at the end of the file, NOT at the end of the nrows of data that we read in, which is what happens currently), but we should raise a better error message.

(BTW, the only way to make this combo work is that we read in the entire file before cutting off the last skipfooter rows but can look into that subsequently...)

In Case 2, creating this reader is deceiving. Any attempt to call .read() on this will raise the ValueError seen in Case 1. It is better that we alert end-users immediately that this reader doesn't work. After this PR, we get some more useful error messages out of the gate:

Case 1:

... ValueError: 'skipfooter' not supported with 'nrows'

Case 2:

... ValueError: 'skipfooter' not supported for iteration