API/ENH: read_csv handling of bad lines (too many/few fields) · Issue #15122 · pandas-dev/pandas (original) (raw)

Currently read_csv has some ways to deal with "bad lines" (bad in the sense of too many or too few fields compared to the determined number of columns):

Some possibilities are missing in this scheme:

Apart from that, #5686 makes the request to be able to specify a custom function to process a bad line, to have even more control.

In #9549 (comment) (and surrounding comments) there was some discussion about how to integrate this, and some idea from there from @jreback and @selasley:

Provide more fine grained control in a new keyword (and deprecate error_bad_lines):

bad_lines='error'|'warn'|'skip'|'process'

or leave out 'warn' and keep warn_bad_lines to be able to combine a warning with both 'skip' and 'process'.

We should further think about whether we can integrate this with the case of too few fields and not only too many.

I think it would be nice to have some better control here, but we should think a bit about the best API for this.