Issue 7089: shlex behaves unexpected if newlines are not whitespace (original) (raw)

Created on 2009-10-09 08:11 by jjdmol2, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
lexertest.py	jjdmol2,2009-10-09 08:11
lexer-newline-tokens.patch	jjdmol2,2009-10-09 08:25
lexer-newline-tokens-patch-2.0.patch	jjdmol2,2009-12-31 08:36	improved patch, includes test cases	review

Messages (6)
msg93776 - (view)	Author: Jan David Mol (jjdmol2)	Date: 2009-10-09 08:11
The shlex module does not function as expected in the presence of comments when newlines are not whitespace. An example (attached): >>> from shlex import shlex >>> >>> lexer = shlex("a \n b") >>> print ",".join(lexer) a,b >>> >>> lexer = shlex("a # comment \n b") >>> print ",".join(lexer) a,b >>> >>> lexer = shlex("a \n b") >>> lexer.whitespace=" " >>> print ",".join(lexer) a, ,b >>> >>> lexer = shlex("a # comment \n b") >>> lexer.whitespace=" " >>> print ",".join(lexer) a,b Now where did my newline go? The comment ate it! Even though the docs seem to indicate the newline is not part of the comment itself: shlex.commenters: The string of characters that are recognized as comment beginners. All characters from the comment beginner to end of line are ignored. Includes just '#' by default.
msg93778 - (view)	Author: Jan David Mol (jjdmol2)	Date: 2009-10-09 08:25
Attached is a patch which fixes this for me. It basically does a fall-through using '\n' when encountering a comment. So that may be a bit of a hack (who says '\n' is the only newline char in there, and not '\r'?) but I'll leave the more intricate stuff to you experts.
msg93820 - (view)	Author: Gabriel Genellina (ggenellina)	Date: 2009-10-10 03:15
If you could add some tests to lib/test/test_shlex.py, there are more chances for this patch to be accepted. Also, consider the case when the comment is on the last line of input and there is no \n ending character.
msg97080 - (view)	Author: Jan David Mol (jjdmol2)	Date: 2009-12-31 08:36
As there seems to be some interest, I've continued working on patching this issue. Attached is an improved version of the patch, including additions to test_shlex.py. Improved in the sense that newlines after a comment are not considered to be actually part of the comment (according to POSIX), which makes a difference when newlines are tokens. To accomplish this, I had to add an ungetc buffer to shlex, in order to push back any newlines read by the readline() routine used when a comment is encountered. @Gabriel: the test case of no newline at the end of the file after a comment is addressed. Relevant POSIX sections are Shell & Utilities 2.3(10) Rationale C.2.3
msg141486 - (view)	Author: Ann Elliott (elliotta)	Date: 2011-08-01 00:58
This error still occurs in version 3.3.0a0.
msg395847 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-06-14 20:18
I've reproduced on 3.11.

History
Date	User	Action	Args
2022-04-11 14:56:53	admin	set	github: 51338
2021-06-14 20🔞08	iritkatriel	set	nosy: + iritkatrielmessages: +
2021-06-14 20:17:01	iritkatriel	set	versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3
2011-08-01 00:58:26	elliotta	set	nosy: + elliottamessages: + versions: + Python 3.3
2009-12-31 08:37:00	jjdmol2	set	files: + lexer-newline-tokens-patch-2.0.patchmessages: +
2009-12-31 08:00:04	ezio.melotti	set	priority: normalnosy: + ferringbversions: - Python 2.5, Python 2.4, Python 3.0stage: test needed
2009-12-31 03:17:37	kanru	set	nosy: + kanru
2009-10-10 03:15:30	ggenellina	set	nosy: + ggenellinamessages: +
2009-10-09 09:41:02	jjdmol2	set	components: + Library (Lib)
2009-10-09 08:25:20	jjdmol2	set	files: + lexer-newline-tokens.patchkeywords: + patchmessages: +
2009-10-09 08:11:52	jjdmol2	create