Issue 27321: Email parser creates a message object that can't be flattened (original) (raw)

Created on 2016-06-14 18:52 by msapiro, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bad_email msapiro,2016-06-14 18:52 The problem message
generator.patch msapiro,2016-06-14 22:54 Suggested fix
Pull Requests
URL Status Linked Edit
PR 1977 closed Johannes Löthberg,2017-06-06 22:48
PR 18074 merged msapiro,2020-01-20 03:02
PR 22796 merged miss-islington,2020-10-19 22:49
PR 22797 merged miss-islington,2020-10-19 22:49
Messages (22)
msg268580 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2016-06-14 18:52
The attached file, bad_email, can be parsed via msg = email.message_from_binary_file(open('bad_email', 'rb')) but then msg.as_string() prodices the following: Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.5/email/message.py", line 159, in as_string g.flatten(self, unixfrom=unixfrom) File "/usr/lib/python3.5/email/generator.py", line 115, in flatten self._write(msg) File "/usr/lib/python3.5/email/generator.py", line 189, in _write msg.replace_header('content-transfer-encoding', munge_cte[0]) File "/usr/lib/python3.5/email/message.py", line 559, in replace_header raise KeyError(_name) KeyError: 'content-transfer-encoding'
msg268589 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2016-06-14 22:54
One additional observation. The original message contained no Content-Transfer-Encoding header even though the message body was raw koi8-r characters. Adding Content-Transfer-Encoding: 8bit to the message headers avoids the issue, but that is not a practical solution as the message was Russian spam received by a Mailman list and the resultant KeyError caused problems in Mailman. We can work on defending against this in Mailman, but I suggest that the munge_cte code in generator._write() avoid the documented possible KeyError raised by replace_header() by using __delitem__() and __setitem__() instead as in the attached generator.patch.
msg279172 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2016-10-22 00:21
While I was reviewing https://gitlab.com/mailman/mailman/merge_requests/197/diffs I noticed the KeyError and it made me thing "hmm, I wonder if this should be turned into one of the email package errors"?
msg295080 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 15:24
Any updates on this? I'm having the same problem with some non-spam emails while trying to use some mail-handling tools written in Python.
msg295093 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-03 18:10
replace_header has a different semantic than del-and-set (replace_header leaves the header in the same location in the list, rather than appending it to the end...that's it's purpose). If replace_header is throwing a key error, then I guess we need a look-before-you-leap if statement. And a test :) Note that a correct fix would preserve the broken input email, but since we're obviously already doing munging I'm fine with just making this work for now.
msg295101 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2017-06-03 19:58
I considered look before you leap, but I decided since we're munging the headers anyway, preserving their order is not that critical, but the patch is easy enough. I'll work on that and a test.
msg295105 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 20:59
Fix: https://github.com/kyrias/cpython/commit/a986a8274a522c73d87360da6930e632a3eb4ebb Testcase: https://github.com/kyrias/cpython/commit/9a510426522e1d714cd0ea238b14de0fc76862b2 Can start a PR once my CLA signature goes through I guess.
msg295107 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2017-06-03 21:25
It looks like Johannes beat me to it. Thanks for that, but see my comments in the diff at https://github.com/kyrias/cpython/commit/a986a8274a522c73d87360da6930e632a3eb4ebb
msg295108 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-03 22:05
Ah, didn't even see your comment before I did it! Fix to the comments are on the same branch, will be rebased before PR is up.
msg295430 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2017-06-08 12:35
maybe we could merge the PR, and I could propose a backport for 3.5 and 3.6. 2.7 is affected ?
msg295442 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-08 14:20
I'm going to try to review this this weekend.
msg295443 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2017-06-08 14:48
ok, I have tested on 3.6 and 3.5, just with the test. and in this case, we get the errors on both. if we apply the patch of Johannes, the test passes and there is no issues. +1 the backports for 3.5 and 3.6 is just a git cherry-picking.
msg296293 - (view) Author: Johannes Löthberg (Johannes Löthberg) * Date: 2017-06-18 20:02
Ping?
msg310513 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-23 17:33
Note: I reviewed this a while ago but the review comments haven't been addressed.
msg311272 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-30 13:52
Requested a small additional change to the new tests, and then this will be ready to go in.
msg327136 - (view) Author: David Cannings (edeca) Date: 2018-10-05 14:00
Ping on an ETA for this fix?
msg336930 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-03-01 16:43
@r.david.murray, it appears that all your requested changes have been addressed on the PR. Please re-review this when you get a chance. Thanks!
msg378083 - (view) Author: Mark Diekhans (diekhans) Date: 2020-10-06 01:32
any chance of getting this merged? A work-around is not obvious
msg378088 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2020-10-06 02:54
I work around it with ``` class Message(email.message.Message): def as_string(self): # Work around for https://bugs.python.org/issue27321 and # https://bugs.python.org/issue32330. try: value = email.message.Message.as_string(self) except (KeyError, LookupError, UnicodeEncodeError): value = email.message.Message.as_bytes(self).decode( 'ascii', 'replace') # Also ensure no unicode surrogates in the returned string. return email.utils._sanitize(value) ``` This is easy for me because it's Mailman which already subclasses email.message.Message for other reasons. It is perhaps more difficult if you aren't already subclassing email.message.Message for other purposes.
msg379052 - (view) Author: miss-islington (miss-islington) Date: 2020-10-19 22:49
New changeset bf838227c35212709dc43b3c3c57f8e1655c1d24 by Mark Sapiro in branch 'master': bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074) https://github.com/python/cpython/commit/bf838227c35212709dc43b3c3c57f8e1655c1d24
msg379056 - (view) Author: miss-islington (miss-islington) Date: 2020-10-19 23:07
New changeset 371146a3f8a989964e2a9c0efc7d776815410fac by Miss Skeleton (bot) in branch '3.8': bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074) https://github.com/python/cpython/commit/371146a3f8a989964e2a9c0efc7d776815410fac
msg379057 - (view) Author: miss-islington (miss-islington) Date: 2020-10-19 23:11
New changeset 72ce82abcf9051b18a05350936de7ecab7306662 by Miss Skeleton (bot) in branch '3.9': bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074) https://github.com/python/cpython/commit/72ce82abcf9051b18a05350936de7ecab7306662
History
Date User Action Args
2022-04-11 14:58:32 admin set github: 71508
2020-10-19 23:11:42 miss-islington set messages: +
2020-10-19 23:07:26 miss-islington set messages: +
2020-10-19 23:02:01 barry set status: open -> closedstage: patch review -> resolvedresolution: fixedversions: + Python 3.9, Python 3.10, - Python 3.7
2020-10-19 22:49:44 miss-islington set pull_requests: + <pull%5Frequest21753>
2020-10-19 22:49:35 miss-islington set pull_requests: + <pull%5Frequest21752>
2020-10-19 22:49:22 miss-islington set nosy: + miss-islingtonmessages: +
2020-10-06 02:54:38 msapiro set messages: +
2020-10-06 01:32:25 diekhans set nosy: + diekhansmessages: +
2020-01-20 03:02:16 msapiro set pull_requests: + <pull%5Frequest17467>
2019-03-01 16:43:46 cheryl.sabella set nosy: + cheryl.sabellamessages: + versions: - Python 3.6
2018-11-03 12:57:37 matrixise set versions: + Python 3.8, - Python 3.5
2018-10-05 14:00:41 edeca set nosy: + edecamessages: +
2018-01-30 13:52:58 r.david.murray set messages: +
2018-01-23 17:33:32 r.david.murray set messages: +
2018-01-23 10:28:23 martin.panter link issue32634 superseder
2017-06-18 20:02:09 Johannes Löthberg set messages: +
2017-06-08 14:48:36 matrixise set messages: +
2017-06-08 14:20:53 r.david.murray set messages: +
2017-06-08 12:35:18 matrixise set stage: needs patch -> patch review
2017-06-08 12:35:05 matrixise set nosy: + matrixisemessages: +
2017-06-06 22:48:03 Johannes Löthberg set pull_requests: + <pull%5Frequest2043>
2017-06-03 22:05:29 Johannes Löthberg set messages: +
2017-06-03 21:25:35 msapiro set messages: +
2017-06-03 20:59:48 Johannes Löthberg set messages: +
2017-06-03 19:58:24 msapiro set messages: +
2017-06-03 18:11:07 r.david.murray set stage: needs patchtype: behaviorversions: + Python 3.6, Python 3.7, - Python 3.4
2017-06-03 18:10:12 r.david.murray set messages: +
2017-06-03 15:24:32 Johannes Löthberg set nosy: + Johannes Löthbergmessages: +
2016-10-22 00:21:27 barry set messages: +
2016-06-14 22:54:21 msapiro set files: + generator.patchkeywords: + patchmessages: +
2016-06-14 21:22:59 maciej.szulik set nosy: + maciej.szulik
2016-06-14 18:52:52 msapiro create