Issue 7089: shlex behaves unexpected if newlines are not whitespace (original) (raw)

Created on 2009-10-09 08:11 by jjdmol2, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
lexertest.py jjdmol2,2009-10-09 08:11
lexer-newline-tokens.patch jjdmol2,2009-10-09 08:25
lexer-newline-tokens-patch-2.0.patch jjdmol2,2009-12-31 08:36 improved patch, includes test cases review
Messages (6)
msg93776 - (view) Author: Jan David Mol (jjdmol2) Date: 2009-10-09 08:11
The shlex module does not function as expected in the presence of comments when newlines are not whitespace. An example (attached): >>> from shlex import shlex >>> >>> lexer = shlex("a \n b") >>> print ",".join(lexer) a,b >>> >>> lexer = shlex("a # comment \n b") >>> print ",".join(lexer) a,b >>> >>> lexer = shlex("a \n b") >>> lexer.whitespace=" " >>> print ",".join(lexer) a, ,b >>> >>> lexer = shlex("a # comment \n b") >>> lexer.whitespace=" " >>> print ",".join(lexer) a,b Now where did my newline go? The comment ate it! Even though the docs seem to indicate the newline is not part of the comment itself: shlex.commenters: The string of characters that are recognized as comment beginners. All characters from the comment beginner to end of line are ignored. Includes just '#' by default.
msg93778 - (view) Author: Jan David Mol (jjdmol2) Date: 2009-10-09 08:25
Attached is a patch which fixes this for me. It basically does a fall-through using '\n' when encountering a comment. So that may be a bit of a hack (who says '\n' is the only newline char in there, and not '\r'?) but I'll leave the more intricate stuff to you experts.
msg93820 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-10-10 03:15
If you could add some tests to lib/test/test_shlex.py, there are more chances for this patch to be accepted. Also, consider the case when the comment is on the last line of input and there is no \n ending character.
msg97080 - (view) Author: Jan David Mol (jjdmol2) Date: 2009-12-31 08:36
As there seems to be some interest, I've continued working on patching this issue. Attached is an improved version of the patch, including additions to test_shlex.py. Improved in the sense that newlines after a comment are not considered to be actually part of the comment (according to POSIX), which makes a difference when newlines are tokens. To accomplish this, I had to add an ungetc buffer to shlex, in order to push back any newlines read by the readline() routine used when a comment is encountered. @Gabriel: the test case of no newline at the end of the file after a comment is addressed. Relevant POSIX sections are Shell & Utilities 2.3(10) Rationale C.2.3
msg141486 - (view) Author: Ann Elliott (elliotta) Date: 2011-08-01 00:58
This error still occurs in version 3.3.0a0.
msg395847 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-06-14 20:18
I've reproduced on 3.11.
History
Date User Action Args
2022-04-11 14:56:53 admin set github: 51338
2021-06-14 20🔞08 iritkatriel set nosy: + iritkatrielmessages: +
2021-06-14 20:17:01 iritkatriel set versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3
2011-08-01 00:58:26 elliotta set nosy: + elliottamessages: + versions: + Python 3.3
2009-12-31 08:37:00 jjdmol2 set files: + lexer-newline-tokens-patch-2.0.patchmessages: +
2009-12-31 08:00:04 ezio.melotti set priority: normalnosy: + ferringbversions: - Python 2.5, Python 2.4, Python 3.0stage: test needed
2009-12-31 03:17:37 kanru set nosy: + kanru
2009-10-10 03:15:30 ggenellina set nosy: + ggenellinamessages: +
2009-10-09 09:41:02 jjdmol2 set components: + Library (Lib)
2009-10-09 08:25:20 jjdmol2 set files: + lexer-newline-tokens.patchkeywords: + patchmessages: +
2009-10-09 08:11:52 jjdmol2 create