msg84595 - (view) |
Author: Tony Nelson (tony_nelson) |
Date: 2009-03-30 17:59 |
feedparser.py does not pares mixed newlines properly. NLCRE_eol, which is used to search for the various newlines at End Of Line, uses $ to match the end of string, but $ also matches \n$, due to a wise long-ago patch by the Effbot. This causes feedparser to match '\r\n\n' at '\r\n', and then to remove the last two characters, leaving '\r', thus eating up a line. Such mixed line endings can occur if a message with CRLF line endings is parsed, written out, and then parsed again. When explicitly searching for various newlines, the \Z end-of-string marker should be used instead. There are two improper uses of $ in feedparser.py. I don't see any others in the email package. NLCRE_eol = re.compile('(\r\n|\r |
\n)$') should be: NLCRE_eol = re.compile('(\r\n |
\r |
msg84746 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2009-03-31 03:34 |
I think this is a good idea. Does the existing test suite still pass with this change? For a long time, email's philosophy was to use native line endings and never expected mixed eol, and it definitely never expected mixed line endings, so we'll need at least a few tests for this. Add them to Lib/email/test_email.py (for now, I want to split this file up soon). |
|
|
msg84751 - (view) |
Author: Tony Nelson (tony_nelson) |
Date: 2009-03-31 04:21 |
make test still passes all tests except test_httpservers on my Python 2.6.1 build. The network resource was not enabled and tk is not available. The new test for CRLFLF at the end of a message body is added to Lib/email/test_email at the end of the TestParsers class. It passes with the fix patch and fails without it. What other tests do you want? |
|
|
msg106960 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-06-03 15:46 |
I've applied the NLCRE_eol part of the patch, and the test, in r81675. Tony, can you think of a test case that would demonstrate the problem with the boundaryre? |
|
|
msg107939 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-06-16 17:29 |
NLCR_eol patch merged to 2.6 in r82010, py3k in r82011, and 3.1 in r82012. Since there's been no further feedback I'll close this. If someone comes up with a test case for the boundary_re, they can open a new issue. |
|
|