Issue 17829: csv.Sniffer.snif doesn't set up the dialect properly for a csv created with dialect=csv.excel_tab and containing quote (") char (original) (raw)

Issue17829

Created on 2013-04-24 15:06 by GhislainHivon, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
csv_sniffing_excel_tab.py	GhislainHivon,2013-04-24 15:06	Exemple of sniffing csv with dialect=csv.excel_tab and quote in data

Messages (3)
msg187709 - (view)	Author: Ghislain Hivon (GhislainHivon)	Date: 2013-04-24 15:06
When sniffing the dialect of a file created with the csv module with dialect=csv.excel_tab and one of the row contain a quote ("), the delimiter is set to ' ' instead of '\t'.
msg214800 - (view)	Author: Antoon Pardon (Antoon.Pardon)	Date: 2014-03-25 09:30
I had a look at this and have the following remarks. 1) the file csv_sniffing_excel_tab.py no longer works with python 3.3. It now produces the folowing traceback: Traceback (most recent call last): File "csv_sniffing_excel_tab.py", line 36, in create_file() File "csv_sniffing_excel_tab.py", line 23, in create_file writer.writerows(test_data) TypeError: 'str' does not support the buffer interface 2) The problem seems to be in the _guess_quote_and_delimiter method. If you always call _guess_delimiter, the sniffer give the correct result. 3) As far as I understand the problem is the first regular expression: (?P[^\w\n"\'])(?P ?)(?P["\']).*?(?P=quote)(?P=delim) Now if we have a line as the following 273:MVREGR1:ByEuPo:"Baryton ""Euphonium"" populaire" The delim group will match the space, the space group will match nothing the quote group will match " the non-group pattern will match "Euphonium" followed by the quote group matching " again and the delim group matching the space. And so we get the wrong delimiter.
msg215031 - (view)	Author: Antoon Pardon (Antoon.Pardon)	Date: 2014-03-28 10:04
I included a patch (against 2.7) that seems to make the test work. The patch prohibits the delim group to match a space.

History
Date	User	Action	Args
2022-04-11 14:57:44	admin	set	github: 62029
2014-03-28 10:04:37	Antoon.Pardon	set	messages: +
2014-03-25 09:30:30	Antoon.Pardon	set	nosy: + Antoon.Pardonmessages: +
2013-04-30 21:19:18	dmi.baranov	set	nosy: + dmi.baranov
2013-04-24 15:06:00	GhislainHivon	create