Issue 1525919: email package content-transfer-encoding behaviour changed (original) (raw)

Created on 2006-07-20 14:22 by ThomasAH, last changed 2022-04-11 14:56 by admin.

Messages (11)
msg29229 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2006-07-20 14:22
from email.Message import Message from email.Charset import Charset, QP text = "=" msg = Message() charset = Charset("utf-8") charset.header_encoding = QP charset.body_encoding = QP msg.set_charset(charset) msg.set_payload(text) print msg.as_string() Gives MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable =3D With the email package from python2.4.3 and 2.4.4c0 the last '=3D' becomes just '=', so an extra msg.body_encode(text) is needed.
msg29230 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2006-07-20 16:01
Logged In: YES user_id=839582 One program which got hit by this is MoinMoin, see http://moinmoin.wikiwikiweb.de/MoinMoinBugs/ResetPasswordEmailImproperlyEncoded
msg58248 - (view) Author: Roger Demetrescu (rdemetrescu) Date: 2007-12-06 16:53
I am not sure if it is related, but anyway... MIMEText behaviour has changed from python 2.4 to 2.5. # Python 2.4 >>> from email.MIMEText import MIMEText >>> m = MIMEText(None, 'html', 'iso-8859-1') >>> m.set_payload('abc ' * 50) >>> print m From nobody Thu Dec 6 12:52:40 2007 Content-Type: text/html; charset="iso-8859-1" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc= abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc ab= c abc abc abc abc abc abc abc abc abc abc abc abc=20 # Python 2.5 >>> from email.MIMEText import MIMEText >>> m = MIMEText(None, 'html', 'iso-8859-1') >>> m.set_payload('abc ' * 50) >>> print m From nobody Thu Dec 6 14:46:07 2007 Content-Type: text/html; charset="iso-8859-1" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc However, if we initialize MIMEText with the text, we get the correct output: # python 2.5 >>> from email.MIMEText import MIMEText >>> m = MIMEText('abc ' * 50, 'html', 'iso-8859-1') >>> print m From nobody Thu Dec 6 13:01:17 2007 Content-Type: text/html; charset="iso-8859-1" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc= abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc ab= c abc abc abc abc abc abc abc abc abc abc abc abc=20 If I want to set payload after MIMEText is already created, I need to use this workaround:: #python 2.5 from email.MIMEText import MIMEText m = MIMEText(None, 'html', 'iso-8859-1') m.set_payload(m._charset.body_encode('abc' * 50)) PS: The issue's versions field is filled with "Python 2.4". Shouldn't it be "Python 2.5" ?
msg73949 - (view) Author: Asheesh Laroia (paulproteus) * Date: 2008-09-27 23:59
Another way to see this issue is that the email module double-encodes when one attempts to use quoted-printable encoding. This has to be worked around by e.g. MoinMoin. It's easy to get proper base64-encoded output of email.mime.text: >>> mt = email.mime.text.MIMEText('Ta mère', 'plain', 'utf-8') >>> 'Content-Transfer-Encoding: base64' in mt.as_string() True >>> mt.as_string().split('\n')[-2] 'VGEgbcOocmU=' There we go, all nice and base64'd. I can *not* figure out how to get quoted-printable-encoding. I found http://docs.python.org/lib/module-email.encoders.html , so I thought great - I'll just encode my MIMEText object: >>> email.encoders.encode_quopri(mt) >>> 'Content-Transfer-Encoding: quoted-printable' in mt.as_string() True Great! Except it's actually double-encoded, and the headers admit to as much. You see here that, in addition to the quoted-printable header just discovered, there is also a base64-related header, and the result is not strictly QP encoding but QP(base64(payload)). >>> 'Content-Transfer-Encoding: base64' in mt.as_string() True >>> mt.as_string().split('\n')[-2] 'VGEgbcOocmU=3D' It should look like: >>> quopri.encodestring('Ta mère') 'Ta m=C3=A8re' I raised this issue on the Baypiggies list <http://mail.python.org/pipermail/baypiggies/2008-September/003983.html>, but luckily I found this here bug. This is with Python 2.5.2-0ubuntu1 from Ubuntu 8.04. paulproteus@alchemy:~ $ python --version Python 2.5.2 If we can come to a decision as to how this *should* work, I could contribute a patch and/or tests to fix it. I could even perhaps write a new section of the Python documentation of the email module explaining this.
msg105045 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2010-05-05 14:59
Roger Demetrescu, I filed the issue with "Python 2.4", because the behavior changed somewhere between 2.4.2 and 2.4.3 The updated link to the MoinMoin bug entry is: http://moinmo.in/MoinMoinBugs/ResetPasswordEmailImproperlyEncoded The workaround I use to be compatible with <= 2.4.2 and >= 2.4.3 is: msg.set_payload('=') if msg.as_string().endswith('='): text = charset.body_encode(text) msg.set_payload(text)
msg184663 - (view) Author: Colin Su (littleq0903) * Date: 2013-03-19 19:04
Confirmed with David, we work on this together on sprints. This is not a bug, if you do "set_payload" directly by yourself, you need to encode the payload by yourself because set_payload() doesn't encode payload if 'Content-Transfer-Encoding' did exist.
msg184685 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-19 21:43
Reviewing this again, it seems to me that there are two separate issues reported here: (1) set_payload on an existing MIMEText object no longer encodes (but it has now been a long time since it changed). (2) the functions in the encodings module, given an already encoded message, double encode. (1) is now set in stone. That is, it is documented as working this way implicitly if you read the set_payload and set_charset docs and has been working that way for a while now. An explicit note should be added to the MIMEText docs, with a workaround.) (2) could be fixed, I think, since it is unlikely that anyone would be depending on such behavior.
msg184692 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-03-19 22:22
New changeset ba500b179c3a by R David Murray in branch '3.2': #1525919: Document MIMEText+set_payload encoding behavior. http://hg.python.org/cpython/rev/ba500b179c3a New changeset fcbc28ef96a3 by R David Murray in branch '3.3': Merge: #1525919: Document MIMEText+set_payload encoding behavior. http://hg.python.org/cpython/rev/fcbc28ef96a3 New changeset b9e07f20832e by R David Murray in branch 'default': Merge: #1525919: Document MIMEText+set_payload encoding behavior. http://hg.python.org/cpython/rev/b9e07f20832e
msg184698 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-19 22:47
I've committed the doc change. I'm going to be lazy and leave this issue open to deal with the encodings module fix.
msg408387 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-12-12 14:51
The encoding functions are now doing orig = msg.get_payload(decode=True) Does this fix the double-encoding issue? This change was made in https://github.com/python/cpython/commit/00ae435deef434f471e39bea3f3ab3a3e3cd90fe
msg408433 - (view) Author: Thomas Arendsen Hein (ThomasAH) Date: 2021-12-13 09:53
Default python3 on Debian buster: $ python3 Python 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import email.mime.text >>> mt = email.mime.text.MIMEText('Ta mère', 'plain', 'utf-8') >>> print(mt.as_string()) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 VGEgbcOocmU= >>> email.encoders.encode_quopri(mt) >>> print(mt.as_string()) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Transfer-Encoding: quoted-printable Ta=20m=C3=A8re So the encoded text looks good now, but there are still duplicate headers. Old output (python2.7) is identical to what Asheesh Laroia (paulproteus) reported for python2.5: --- Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Transfer-Encoding: quoted-printable VGEgbcOocmU=3D ---
History
Date User Action Args
2022-04-11 14:56:19 admin set github: 43702
2021-12-13 09:55:31 iritkatriel set versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.2, Python 3.3, Python 3.4
2021-12-13 09:53:07 ThomasAH set status: pending -> openmessages: +
2021-12-12 14:51:26 iritkatriel set status: open -> pendingnosy: + iritkatrielmessages: +
2013-03-19 22:48:10 r.david.murray set title: email package quoted printable behaviour changed -> email package content-transfer-encoding behaviour changed
2013-03-19 22:47:33 r.david.murray set messages: +
2013-03-19 22:22:03 python-dev set nosy: + python-devmessages: +
2013-03-19 21:43:00 r.david.murray set messages: + components: + Library (Lib), email
2013-03-19 19:05:55 littleq0903 set assignee: docs@pythoncomponents: + Documentation, - Library (Lib), emailnosy: + docs@python
2013-03-19 19:04:52 littleq0903 set nosy: + littleq0903messages: +
2013-03-19 18:06:11 littleq0903 set versions: + Python 2.7, Python 3.2, Python 3.3, Python 3.4, - Python 2.4
2012-05-16 01:22:03 r.david.murray set assignee: r.david.murray -> (no value)components: + email
2010-12-14 19:15:29 r.david.murray set type: behavior
2010-05-05 14:59:24 ThomasAH set messages: +
2010-05-05 13:51:12 barry set assignee: barry -> r.david.murraynosy: + r.david.murray
2008-09-27 23:59:46 paulproteus set nosy: + paulproteusmessages: +
2007-12-06 16:53:42 rdemetrescu set nosy: + rdemetrescumessages: +
2006-07-20 14:22:21 ThomasAH create