BUG: read_csv names argument inconsisten between c and python engine · Issue #38453 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
After #38445 there is another inconsistency left to adress.
s = """a, b, c, d 1,2,3,4, 5,6,7,8,""" pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="c")
pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python")
Problem description
The bug is caused from the differing lenghts of the header and the names argument.
This returns
A B C D E
0 1 2 3 4 NaN
1 5 6 7 8 NaN
for the c
engine and raises
Traceback (most recent call last):
File "/home/developer/.config/JetBrains/PyCharm2020.3/scratches/scratch_4.py", line 323, in <module>
print(pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python"))
File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 605, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 814, in __init__
self._engine = self._make_engine(self.engine)
File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 1045, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2303, in __init__
) = self._infer_columns()
File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2692, in _infer_columns
raise ValueError(
ValueError: Number of passed names did not match number of header fields in the file
Process finished with exit code 1
Expected Output
Would expect that both return the same and python engine does not raise.
Output of pd.show_versions()
master