Loading CSV files (using read_csv
) with blank lines between header and data rows quits Python interpreter · Issue #28071 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd # tested with pandas 25.0 using Python 3.6.8
pd.read_csv('my_csv.csv', delimiter='|', header=4, nrows=1, skip_blank_lines=False) # this makes interpreter exit without any error message
pd.read_csv('my_csv.csv', delimiter='|', header=4, nrows=2, skip_blank_lines=False) # this is fine producing output below
int_id_1_1_1 date_2019-01-01_2019-12-31_2 ascii_str_8_8_3 double_-1.0_1.0_4 integer_-1000_1000_5
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
Problem description
I have been trying to load a test CSV file ("my_csv.txt", attached), which is structured in a way that there's an information text on the second row; row header line on the fifth row; and the data starts at the ninth row. As you can see in the Python code above, read_csv
fails when nrows=1
, but doesn't when nrows>1
.
I think there's some uncaught bug in Pandas' read_csv
when CSV file has blank lines between header and the start of the data rows. Thank you for your hard work maintaining and extending this very useful library.