Issue 5610: email feedparser.py CRLFLF bug: $ vs \Z (original) (raw)

Issue5610

Created on 2009-03-30 17:59 by tony_nelson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
feedparser_crlflf.patch tony_nelson,2009-03-30 17:59 patch to fix feedparser CRLFLF problem, no tests
feedparser_crlflf_test.patch tony_nelson,2009-03-31 04:21 Test for feedparser CRLFLF problem.
Messages (5)
msg84595 - (view) Author: Tony Nelson (tony_nelson) Date: 2009-03-30 17:59
feedparser.py does not pares mixed newlines properly. NLCRE_eol, which is used to search for the various newlines at End Of Line, uses $ to match the end of string, but $ also matches \n$, due to a wise long-ago patch by the Effbot. This causes feedparser to match '\r\n\n' at '\r\n', and then to remove the last two characters, leaving '\r', thus eating up a line. Such mixed line endings can occur if a message with CRLF line endings is parsed, written out, and then parsed again. When explicitly searching for various newlines, the \Z end-of-string marker should be used instead. There are two improper uses of $ in feedparser.py. I don't see any others in the email package. NLCRE_eol = re.compile('(\r\n|\r \n)$') should be: NLCRE_eol = re.compile('(\r\n \r
msg84746 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2009-03-31 03:34
I think this is a good idea. Does the existing test suite still pass with this change? For a long time, email's philosophy was to use native line endings and never expected mixed eol, and it definitely never expected mixed line endings, so we'll need at least a few tests for this. Add them to Lib/email/test_email.py (for now, I want to split this file up soon).
msg84751 - (view) Author: Tony Nelson (tony_nelson) Date: 2009-03-31 04:21
make test still passes all tests except test_httpservers on my Python 2.6.1 build. The network resource was not enabled and tk is not available. The new test for CRLFLF at the end of a message body is added to Lib/email/test_email at the end of the TestParsers class. It passes with the fix patch and fails without it. What other tests do you want?
msg106960 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-03 15:46
I've applied the NLCRE_eol part of the patch, and the test, in r81675. Tony, can you think of a test case that would demonstrate the problem with the boundaryre?
msg107939 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-16 17:29
NLCR_eol patch merged to 2.6 in r82010, py3k in r82011, and 3.1 in r82012. Since there's been no further feedback I'll close this. If someone comes up with a test case for the boundary_re, they can open a new issue.
History
Date User Action Args
2022-04-11 14:56:47 admin set github: 49860
2011-02-02 21:41:36 r.david.murray link issue6465 superseder
2010-06-16 17:29:41 r.david.murray set status: open -> closedversions: + Python 3.1, Python 2.7, Python 3.2type: behaviormessages: + resolution: fixedstage: test needed -> resolved
2010-06-03 15:46:44 r.david.murray set messages: + stage: test needed
2010-05-05 13:46:47 barry set assignee: barry -> r.david.murraynosy: + r.david.murray
2010-01-09 17:50:05 r.david.murray link issue6681 superseder
2009-03-31 04:21:18 tony_nelson set files: + feedparser_crlflf_test.patchmessages: + versions: - Python 3.1
2009-03-31 03:34:43 barry set messages: +
2009-03-31 03:07:17 barry set assignee: barryversions: + Python 3.1
2009-03-30 17:59:36 tony_nelson create