Raise an error on redundant definition of separator in read_csv · Issue #39823 · pandas-dev/pandas (original) (raw)
Is your feature request related to a problem?
When calling read_csv
and specifying both sep
and delim_whitespace
, or both delimiter
and delim_whitespace
I get a ValueError
in pandas 1.3.0.dev0+713.g9f792cd903. For example:df = pd.read_csv("my_data.csv", sep=' ', delim_whitespace=True)
anddf = pd.read_csv("my_data.csv", delimiter=' ', delim_whitespace=True)
give an error. However, when I specify both sep
and delimiter
, for example:df = pd.read_csv("my_data.csv", sep=' ', delimiter='.')
sep
is just silently ignored. I think it would make sense to raise an ValueError
in this case as well.
Moreover, the error that is raised today gives the message Specified a delimiter with both sep and delim_whitespace=True; you can only specify one.
regardless of whether I specify sep
or delimiter
together with delim_whitespace
. I think it should be changed to Specified a delimiter with both delimiter and delim_whitespace=True; you can only specify one.
when delimiter
is used.
Describe the solution you'd like
Raise a ValueError
when both sep
and delimiter
are used to specify the separator for read_csv
.
Change the message "Specified a delimiter with both sep and delim_whitespace=True; you can only specify one." to "Specified a delimiter with both delimiter and delim_whitespace=True; you can only specify one." when both delimiter
and delim_whitespace
are specified.
API breaking implications
This will "break" code that specify both sep
and delimiter
. However, it is consistent with the behavior when you specify one of those parameters together with delim_whitespace
. Moreover, a similar change has been done at some time between pandas 0.25.3 (the latest version provided by aptitude) and the development version. In the formerpd.read_csv("my_data.csv", delim_whitespace=True, sep=',')
doesn't cause a ValueError
, but in the latter it does.
Describe alternatives you've considered
An alternative could be to issue a warning instead of an error, but an error is more consistent with the current behavior for the combination of delim_whitespace
and (sep
or delimiter
).