Raise an error in read_csv when names and prefix both are not None · Issue #39123 · pandas-dev/pandas (original) (raw)

Problem description

As a result of the discussion in issue #27394, read_csv is changed such that it raises an error when both header and prefix are different from None. A user had misunderstood how to (not) use header and prefix together. I think that the usage of namesand prefix can be misunderstood in a similar way.

It could also be that a user accidentally provides both arguments and expects prefix to have effect. Right now, it seems like prefix is silently ignored when names is provided.

Describe the solution you'd like

Raise a ValueError when the read_csv arguments prefix and names both differ from None, in accordance with issue #27394 and pull request #31383.

API breaking implications

This will "break" code that passes values (!=None) for both prefix and names, but since it was an accepted solution for issue #27394, I think it could be used here as well.

Describe alternatives you've considered

Another possibility is to issue a warning instead of an error, but that would be inconsistent with the behavior when prefix and header is both not None.

Additional context

Examples below are run in pandas version 1.3.0.dev0+210.g9f1a41dee.

When I run

pandas.read_csv("my_data.csv", prefix="XZ", header=0)

I get the following output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)
  File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 1853, in __init__
    ParserBase.__init__(self, kwds)
  File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 1334, in __init__
    raise ValueError(
ValueError: Argument prefix must be None if argument header is not None

but running

pandas.read_csv("my_data.csv", prefix="XZ", names=["a", "b", "c"])

works fine, except that prefix is silently ignored.