BUG: read_csv names argument inconsisten between c and python engine · Issue #38453 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

After #38445 there is another inconsistency left to adress.

s = """a, b, c, d 1,2,3,4, 5,6,7,8,""" pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="c")

pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python")

Problem description

The bug is caused from the differing lenghts of the header and the names argument.

This returns

   A  B  C  D   E
0  1  2  3  4 NaN
1  5  6  7  8 NaN

for the c engine and raises

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.3/scratches/scratch_4.py", line 323, in <module>
    print(pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python"))
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2303, in __init__
    ) = self._infer_columns()
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2692, in _infer_columns
    raise ValueError(
ValueError: Number of passed names did not match number of header fields in the file

Process finished with exit code 1

Expected Output

Would expect that both return the same and python engine does not raise.

Output of pd.show_versions()

master