Incorrect skipping of lines with inline comments and printing warnings · Issue #16472 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Your code here

from io import StringIO import numpy as np import pandas as pd

test_input = u"""
1 2 2 2 3 3 2 3 # 3 fields 4 2 3# 3 fields 5 2 # 2 fields 6 2# 2 fields 7 # 1 field, NaN 8# 1 field, NaN 9 2 3 # skipped line

comment"""

df = pd.read_table(StringIO(test_input), comment='#', header=None, delimiter='\s+', skiprows=0, error_bad_lines=False)

print df

Expected: only lines with <= 2 fields should appear in the df, others should be warned as skipped

assert (df == pd.DataFrame([[1, 2], [5, 2], [6, 2], [7, np.nan], [8, np.nan]], index=list(range(5)), columns=[0,1])).all().all()

Problem description

Only lines with <= 2 fields should appear in the df, others should be skipped and their warning should be printed on stderr.

Output

Skipping line 2: expected 2 fields, saw 3
Skipping line 4: expected 2 fields, saw 6
Skipping line 6: expected 2 fields, saw 4

   0  1
0  1  2
1  7  8

Problems:

Expected Output

Skipping line 2: expected 2 fields, saw 3
Skipping line 3: expected 2 fields, saw 3
Skipping line 4: expected 2 fields, saw 3
Skipping line 9: expected 2 fields, saw 3

   0    1
0  1  2.0
1  5  2.0
2  6  2.0
3  7  NaN
4  8  NaN

Output of pd.show_versions()

# Paste the output here pd.show_versions() here

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.14-200.fc25.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.3.3
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.3
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.7.3
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.6
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: 0.2.1