The shlex module does not function as expected in the presence of comments when newlines are not whitespace. An example (attached): >>> from shlex import shlex >>> >>> lexer = shlex("a \n b") >>> print ",".join(lexer) a,b >>> >>> lexer = shlex("a # comment \n b") >>> print ",".join(lexer) a,b >>> >>> lexer = shlex("a \n b") >>> lexer.whitespace=" " >>> print ",".join(lexer) a, ,b >>> >>> lexer = shlex("a # comment \n b") >>> lexer.whitespace=" " >>> print ",".join(lexer) a,b Now where did my newline go? The comment ate it! Even though the docs seem to indicate the newline is not part of the comment itself: shlex.commenters: The string of characters that are recognized as comment beginners. All characters from the comment beginner to end of line are ignored. Includes just '#' by default.
Attached is a patch which fixes this for me. It basically does a fall-through using '\n' when encountering a comment. So that may be a bit of a hack (who says '\n' is the only newline char in there, and not '\r'?) but I'll leave the more intricate stuff to you experts.
If you could add some tests to lib/test/test_shlex.py, there are more chances for this patch to be accepted. Also, consider the case when the comment is on the last line of input and there is no \n ending character.
As there seems to be some interest, I've continued working on patching this issue. Attached is an improved version of the patch, including additions to test_shlex.py. Improved in the sense that newlines after a comment are not considered to be actually part of the comment (according to POSIX), which makes a difference when newlines are tokens. To accomplish this, I had to add an ungetc buffer to shlex, in order to push back any newlines read by the readline() routine used when a comment is encountered. @Gabriel: the test case of no newline at the end of the file after a comment is addressed. Relevant POSIX sections are Shell & Utilities 2.3(10) Rationale C.2.3