Issue 1555842: email package and Unicode strings handling (original) (raw)

Issue1555842

Created on 2006-09-10 16:04 by manlioperillo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg29793 - (view) Author: Manlio Perillo (manlioperillo) Date: 2006-09-10 16:04
The support for Unicode strings in the email package (notably MIMEText and Header class) is not uniform. The behaviour with Unicode strings in Header is documented but the interface is not good. This code works, but it should not: >>> h = Header.Header(u"àèìòù", charset="us-ascii") >>> m = Message.Message() >>> m["Subject"] = h >>> print m.as_string() Allowing this to work can cause confusion, I'm saying that the charset is us-ascii, not utf-8. With MIMEText I obtain: m = MIMEText.MIMEText(u"àèìòù", _charset="us-ascii") >>> print m.as_string() [ exception ] I think that the correct behaviour (for all functions accepting strings) is: - Do not accept plain str strings (8-bit). Accept only if they are plain ascii (7-bit). - The charset specified should not be considered an hint, but the charset I want to be used. Regards Manlio Perillo
msg29794 - (view) Author: Manlio Perillo (manlioperillo) Date: 2006-09-10 17:35
Logged In: YES user_id=1054957 The last example is not right. Here is the correct one: >>> m = MIMEText.MIMEText(u"àèìòù", _charset="utf-8") Traceback (most recent call last): File "", line 1, in ? File "C:\Python2.4\lib\email\MIMEText.py", line 28, in __init__ self.set_payload(_text, _charset) File "C:\Python2.4\lib\email\Message.py", line 218, in set_payload self.set_charset(charset) File "C:\Python2.4\lib\email\Message.py", line 260, in set_charset self._payload = charset.body_encode(self._payload) File "C:\Python2.4\lib\email\Charset.py", line 366, in body_encode return email.base64MIME.body_encode(s) File "C:\Python2.4\lib\email\base64MIME.py", line 136, in encode enc = b2a_base64(s[i:i + max_unencoded]) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) So it seems that email.Message does not handle Unicode strings. The code works if I set the charset to latin-1.
msg84471 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-03-30 03:08
Confirmed on trunk.
msg106873 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-02 02:18
It took me a while to figure out why latin-1 works. I turns out to be an accident: latin-1 uses quoted-printable encoding, and the email quoprimime module accidentally manages to quote unicode characters in the latin-1 range. The Header example, as noted by the OP, is working as documented. This confusing interface isn't going to get fixed in the current email package. The equivalent email6 API will be cleaner. The MIMEText portion is a duplicate of issue 1368247.
History
Date User Action Args
2022-04-11 14:56:20 admin set github: 43960
2010-12-27 17:04:58 r.david.murray unlink issue1685453 dependencies
2010-06-02 02🔞53 r.david.murray set status: open -> closedresolution: duplicatemessages: + stage: test needed -> resolved
2010-05-05 13:41:25 barry set assignee: barry -> r.david.murraynosy: + r.david.murray
2009-05-01 16:00:24 bgamari set nosy: + bgamari
2009-03-30 22:56:23 ajaksu2 link issue1685453 dependencies
2009-03-30 03:08:15 ajaksu2 set versions: + Python 2.6nosy: + ajaksu2messages: + type: behaviorstage: test needed
2006-09-10 16:04:26 manlioperillo create