the function 'email.message_from_file' modifies the message structure when the parsed is invalid (for example, when a closed boudary is missing). The attribute defects is also empty In the attachment (sample.tgz) you will find: - orig.eml : an email with an invalid structure The boundary "000101020201080900040301" isn't closed - after_parsing.eml: same email after calling email.message_from_file() The boundary is now closed. And the defects attribute is empty - test.py: python script to reproduce.
This patch does: - when a close boundary isn't found then the error 'email.errors.CloseBoundaryNotFoundDefect' is added to the defects list. - it doesn't modify the current behaviour of the feedparser (eg: the function email.message_from_file still modifies the message structure)
with the patch applied: {{{ $ ./test.py PARSER INVALID EMAIL defects found ! [<email.errors.CloseBoundaryNotFoundDefect instance at 0x7f41421c0488>] }}}
I also noticed that 'email' modifies the message structure when the header/body separator is missing. And nothing is added to the defect list. In the attachment, you'll find : - email.patch: this patch add the following error to the defects list : - the error 'email.errors.CloseBoundaryNotFoundDefect' when a boundary isn't closed. - the error 'email.errors.MissingHeaderBodySeparator' when the header/body isn't found (patch for python 2.7.2) - orig.email: a email without a header/body separator
Thanks for the patch. I haven't forgotten about it, but it will probably still be a while yet before I get to it. Hopefully before 3.3 is released, though.
I didn't wind up using your patch (for one thing I forgot that there were two separate issues in this patch and independently rediscovered and fixed the MissingHeaderBodySeparatorDefect one). However, this is now fixed in 3.3. Unfortunately, since it introduces a new defect, it is an enhancement and by our rules can't be backported.
title: email modifies the message structure when the parsed email is invalid -> email modifies the message structure when the parsed email is invalid without registering defectsnosy: + barrymessages: + assignee: r.david.murray -> components: + email, - Library (Lib)