Issue 1409455: email.Message.set_payload followed by bad result get_payload (original) (raw)

Created on 2006-01-18 22:09 by msapiro, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
example.py msapiro,2006-01-18 22:09 example script which illustrates the problem
Message.py.patch.txt msapiro,2006-01-20 23:19 Hint at possible fix
1409455.txt barry,2006-02-06 03:42
Messages (6)
msg27308 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2006-01-18 22:09
Under certain circumstances, in particular when charset is 'iso-8859-1', where msg is an email.Message() instance, msg.set_payload(text, charset) 'apparently' encodes the text as quoted-printable and adds a Content-Transfer-Encoding: quoted-printable header to msg. I say 'apparently' because if one prints msg or creates a Generator instance and writes msg to a file, the message is printed/written as a correct, quoted-printable encoded message, but text = msg._payload or text = msg.get_payload() gives the original text, not quoted-printable encoded, and text = msg.get_payload(decode=True) gives a quoted-printable decoding of the original text which is munged if the original text included '=' in some ways. This is causing problems in Mailman which are currently worked around by flagging if the payload was set by set_payload() and not subsequently 'decoding' in that case, but it would be better if set_payload()/get_payload() worked properly. A script is attached which illustrates the problem.
msg27309 - (view) Author: Mark Sapiro (msapiro) * (Python triager) Date: 2006-01-20 23:19
Logged In: YES user_id=1123998 I've looked at the email library and I see the problem. msg.set_payload() never QP encodes msg._payload. When the message is stringified or flattened by a generator, the generator's _handle_text() method does the encoding and it is msg._charset that signals the need to do this. Thus when the message object is ultimately converted to a suitable external form, the body is QP encoded, but internally it never is. Thus, subsequent msg.get_payload() calls return unexpected results. It appears (from minimal testing) that when a text message is parsed into an email.Message.Message instance, _charset is None even if there is a character set specification in a Content-Type: header. I have attached a patch (Message.py.patch.txt) which may fix the problem. It has only been tested against the already attached example.py so it is really untested. Also, it only addresses the quoted-printable case. I haven't even thought about whether there might be a similar problem involving base64.
msg27310 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-02-06 03:42
Logged In: YES user_id=12800 See the attached patch for what I think is ultimately the right fix. The idea is that when set_payload() is called, the payload is immediately encoded so that get_payload() will do the right thing. Also, Generator.py has to be fixed to not doubly encode the payload. Run against your example, it seems to DTRT. It also passes all but one of the email pkg unit tests. The one failure is, I believe due to an incorrect test. The patch includes a fix for that as well as adding a test for get_payload(decode=True). I'd like to get some feedback from the email-sig before applying this, but it seems right to me.
msg27311 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-02-08 03:07
Logged In: YES user_id=12800 See the latest patch in issue 1409458: https://sourceforge.net/support/tracker.php?aid=1409538
msg27312 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-02-08 13:34
Logged In: YES user_id=12800 r42270 for Python 2.3/email 2.5. I will forward port these to Python 2.4 and 2.5 (email 3.0).
msg27313 - (view) Author: Chris Withers (fresh) Date: 2006-09-10 12:26
Logged In: YES user_id=24723 This fix seems to have caused issues for code that does the following: from email.Charset import Charset,QP from email.MIMEText import MIMEText charset = Charset('utf-8') charset.body_encoding = QP msg = MIMEText( u'Some text with chars that need encoding: \xa3', 'plain', ) # set the charset msg.set_charset(charset) print msg.as_string() Before this fix, the above would result in: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" Some text with chars that need encoding: =A3 Now I get: Traceback (most recent call last): File "test_encoding.py", line 14, in ? msg.as_string() File "c:\python24\lib\email\Message.py", line 129, in as_string g.flatten(self, unixfrom=unixfrom) File "c:\python24\lib\email\Generator.py", line 82, in flatten self._write(msg) File "c:\python24\lib\email\Generator.py", line 113, in _write self._dispatch(msg) File "c:\python24\lib\email\Generator.py", line 139, in _dispatch meth(msg) File "c:\python24\lib\email\Generator.py", line 182, in _handle_text self._fp.write(payload) UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 41: ordinal not in range(128) Am I doing something wrong here or is this patch in error?
History
Date User Action Args
2022-04-11 14:56:15 admin set github: 42807
2006-01-18 22:09:59 msapiro create