Issue 1720390: Remove backslash escapes from tokenize.c. (original) (raw)

Created on 2007-05-16 22:23 by ron_adam, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
norawescape3.diff	ron_adam,2007-06-14 05:10	Rrmoves escape chrs from raw strings.
tokenize_cleanup_patch.diff	ron_adam,2007-11-16 00:36
no_raw_escapes_patch.diff	ron_adam,2007-11-16 00:36

Messages (11)
msg52631 - (view)	Author: Ron Adam (ron_adam) *	Date: 2007-05-16 22:23
This patch modifies tokanizer.c so that it does not skip the character after a backslash in determining the end of a string in raw strings only. A few strings needed changes in order to compile. Two in textwrap.py, and one in distutils/util.py. This does not include changes needed for tests to pass. I'll include those in a separate patch.
msg52632 - (view)	Author: Ron Adam (ron_adam) *	Date: 2007-05-16 22:31
Forgot to specify... This is against the py3k-struni branch, revision 55388.
msg52633 - (view)	Author: Ron Adam (ron_adam) *	Date: 2007-05-20 02:14
Here's a more complete patch which modifies the following files... (in py3k_struni branch) M Python/ast.c M Parser/tokenizer.c M Lib/test/tokenize_tests.txt M Lib/tokenize.py The test still dosen't pass, but it fails in the same way as it did before these changes were made. I'll continue to look into this. I think it's more of a problem with the test it self and not a problem with the modules. Or it may be a bug in the struni branch that is yet to be fixed. The following alter one or two raw strings each replacing the outer most quotes with triple quotes in most cases. M Lib/sgmllib.py M Lib/markupbase.py M Lib/textwrap.py M Lib/distutils/util.py M Lib/cookielib.py M Lib/pydoc.py M Lib/doctest.py M Lib/xml/etree/ElementTree.py M Lib/HTMLParser.py
msg52634 - (view)	Author: Ron Adam (ron_adam) *	Date: 2007-05-20 02:15
Here's a more complete patch which modifies the following files... (in py3k_struni branch) M Python/ast.c M Parser/tokenizer.c M Lib/test/tokenize_tests.txt M Lib/tokenize.py The test still dosen't pass, but it fails in the same way as it did before these changes were made. I'll continue to look into this. I think it's more of a problem with the test it self and not a problem with the modules. Or it may be a bug in the struni branch that is yet to be fixed. The following alter one or two raw strings each replacing the outer most quotes with triple quotes in most cases. M Lib/sgmllib.py M Lib/markupbase.py M Lib/textwrap.py M Lib/distutils/util.py M Lib/cookielib.py M Lib/pydoc.py M Lib/doctest.py M Lib/xml/etree/ElementTree.py M Lib/HTMLParser.py File Added: norawescape2.diff
msg52635 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-05-26 04:27
Just FYI, I have downloaded this and will attempt to apply it some time next week.
msg52636 - (view)	Author: Ron Adam (ron_adam) *	Date: 2007-06-14 05:10
Updated patch. The error that I had mentioned before has been fixed. Added changes to the tokanize_test output comparison file. It has random failures due to it using a random sample of other tests as sources to do round trip tests with. If those files have a problems in them, then this tests fails. Added a filename output line to the test so the problem file can be identified. Patch is against the py3k_struni branch, revision 55970 File Added: norawescape3.diff
msg57250 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-11-08 14:20
Can you create a new patch and verify that the problem still exists? norawescape3.diff doesn't apply cleanly any more.
msg57262 - (view)	Author: Ron Adam (ron_adam) *	Date: 2007-11-08 17:28
Yes, I will update it.
msg57290 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-11-09 00:32
FWIW, I'm +1 on the part of this patch that disables \u in raw strings. I just had a problem with a doctest that couldn't be run in verbose mode because \u was being interpreted in raw mode... But I'm still solidly -1 on allowing trailing \.
msg57578 - (view)	Author: Ron Adam (ron_adam) *	Date: 2007-11-16 00:36
It looks like the disabling of \u and \U in raw strings is done. Does tokenize.py need to be fixed, to match? While working on this I was able to clean up the string parsing parts of tokenize.c, and have a separate patch with just that. And an updated patch with both the cleaned up tokenize.c and the no escapes in raw strings in case it is desired after all.
msg57579 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-11-16 00:52
I don't think tokenizer.py needs to be changed -- it never interpreted backslashes in string literals anyway (not even in regular, non-raw literals). The tokenizer.c cleanup is submitted as revision 59007. I still am not warming up towards the no-raw-escapes feature, so I'm closing this as rejected. Nevertheless, thanks for your efforts!

History
Date	User	Action	Args
2022-04-11 14:56:24	admin	set	github: 44961
2008-01-06 22:29:46	admin	set	keywords: - py3kversions: Python 3.0
2007-11-16 00:52:49	gvanrossum	set	status: open -> closedresolution: rejectedmessages: +
2007-11-16 00:36:36	ron_adam	set	files: + no_raw_escapes_patch.diff
2007-11-16 00:36:11	ron_adam	set	files: + tokenize_cleanup_patch.diffmessages: +
2007-11-09 00:32:17	gvanrossum	set	messages: +
2007-11-08 17:28:22	ron_adam	set	messages: +
2007-11-08 14:20:18	christian.heimes	set	nosy: + christian.heimesmessages: +
2007-08-30 00:22:41	gvanrossum	set	versions: + Python 3.0
2007-05-16 22:23:54	ron_adam	create