Loading CSV files (using read_csv) with blank lines between header and data rows quits Python interpreter · Issue #28071 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd # tested with pandas 25.0 using Python 3.6.8
pd.read_csv('my_csv.csv', delimiter='|', header=4, nrows=1, skip_blank_lines=False) # this makes interpreter exit without any error message

pd.read_csv('my_csv.csv', delimiter='|', header=4, nrows=2, skip_blank_lines=False) # this is fine producing output below
   int_id_1_1_1  date_2019-01-01_2019-12-31_2  ascii_str_8_8_3  double_-1.0_1.0_4  integer_-1000_1000_5
0           NaN                           NaN              NaN                NaN                   NaN
1           NaN                           NaN              NaN                NaN                   NaN

Problem description

I have been trying to load a test CSV file ("my_csv.txt", attached), which is structured in a way that there's an information text on the second row; row header line on the fifth row; and the data starts at the ninth row. As you can see in the Python code above, read_csv fails when nrows=1 , but doesn't when nrows>1.

I think there's some uncaught bug in Pandas' read_csv when CSV file has blank lines between header and the start of the data rows. Thank you for your hard work maintaining and extending this very useful library.

my_csv.txt