Raise an error on redundant definition of separator in read_csv · Issue #39823 · pandas-dev/pandas (original) (raw)

When calling read_csv and specifying both sep and delim_whitespace, or both delimiter and delim_whitespace I get a ValueError in pandas 1.3.0.dev0+713.g9f792cd903. For example:
df = pd.read_csv("my_data.csv", sep=' ', delim_whitespace=True)
and
df = pd.read_csv("my_data.csv", delimiter=' ', delim_whitespace=True)
give an error. However, when I specify both sep and delimiter, for example:
df = pd.read_csv("my_data.csv", sep=' ', delimiter='.')
sep is just silently ignored. I think it would make sense to raise an ValueError in this case as well.

Moreover, the error that is raised today gives the message Specified a delimiter with both sep and delim_whitespace=True; you can only specify one. regardless of whether I specify sep or delimiter together with delim_whitespace. I think it should be changed to Specified a delimiter with both delimiter and delim_whitespace=True; you can only specify one. when delimiter is used.

Describe the solution you'd like

Raise a ValueError when both sep and delimiter are used to specify the separator for read_csv.

Change the message "Specified a delimiter with both sep and delim_whitespace=True; you can only specify one." to "Specified a delimiter with both delimiter and delim_whitespace=True; you can only specify one." when both delimiter and delim_whitespace are specified.

API breaking implications

This will "break" code that specify both sep and delimiter. However, it is consistent with the behavior when you specify one of those parameters together with delim_whitespace. Moreover, a similar change has been done at some time between pandas 0.25.3 (the latest version provided by aptitude) and the development version. In the former
pd.read_csv("my_data.csv", delim_whitespace=True, sep=',')
doesn't cause a ValueError, but in the latter it does.

Describe alternatives you've considered

An alternative could be to issue a warning instead of an error, but an error is more consistent with the current behavior for the combination of delim_whitespace and (sep or delimiter).