Raise an error in read_csv when names and prefix both are not None · Issue #39123 · pandas-dev/pandas (original) (raw)
Problem description
As a result of the discussion in issue #27394, read_csv
is changed such that it raises an error when both header
and prefix
are different from None
. A user had misunderstood how to (not) use header
and prefix
together. I think that the usage of names
and prefix
can be misunderstood in a similar way.
It could also be that a user accidentally provides both arguments and expects prefix
to have effect. Right now, it seems like prefix
is silently ignored when names
is provided.
Describe the solution you'd like
Raise a ValueError when the read_csv
arguments prefix
and names
both differ from None
, in accordance with issue #27394 and pull request #31383.
API breaking implications
This will "break" code that passes values (!=None
) for both prefix
and names
, but since it was an accepted solution for issue #27394, I think it could be used here as well.
Describe alternatives you've considered
Another possibility is to issue a warning instead of an error, but that would be inconsistent with the behavior when prefix
and header
is both not None
.
Additional context
Examples below are run in pandas version 1.3.0.dev0+210.g9f1a41dee.
When I run
pandas.read_csv("my_data.csv", prefix="XZ", header=0)
I get the following output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 605, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 814, in __init__
self._engine = self._make_engine(self.engine)
File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 1045, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 1853, in __init__
ParserBase.__init__(self, kwds)
File "/home/malin/workspace/py/pandas/pandas/io/parsers.py", line 1334, in __init__
raise ValueError(
ValueError: Argument prefix must be None if argument header is not None
but running
pandas.read_csv("my_data.csv", prefix="XZ", names=["a", "b", "c"])
works fine, except that prefix
is silently ignored.