Issue 20089: email.message_from_string no longer working in Python 3.4 (original) (raw)

Issue20089

Created on 2013-12-28 14:09 by apollo13, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg207028 - (view) Author: Florian Apolloner (apollo13) Date: 2013-12-28 14:09
Given this email: --- Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: =?utf-8?q?Ch=C3=A8re_maman?= From: from@example.com To: to@example.com Date: Sat, 28 Dec 2013 13:08:07 -0000 Message-ID: <20131228130807.3669.79195@localhost> Je t'aime très fort --- I get this traceback: --- File "/home/florian/sources/cpython/Lib/email/__init__.py", line 40, in message_from_string return Parser(*args, **kws).parsestr(s) File "/home/florian/sources/cpython/Lib/email/parser.py", line 70, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/home/florian/sources/cpython/Lib/email/parser.py", line 60, in parse return feedparser.close() File "/home/florian/sources/cpython/Lib/email/feedparser.py", line 170, in close self._call_parse() File "/home/florian/sources/cpython/Lib/email/feedparser.py", line 163, in _call_parse self._parse() File "/home/florian/sources/cpython/Lib/email/feedparser.py", line 449, in _parsegen self._cur.set_payload(EMPTYSTRING.join(lines)) File "/home/florian/sources/cpython/Lib/email/message.py", line 311, in set_payload " payload") from None TypeError: charset argument must be specified when non-ASCII characters are used in the payload --- This is new in 3.4 since that's the first version which requires set_payload to provide a charset argument, imo message_from_string should figure that out from the message.
msg207031 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-12-28 15:43
Hmm. There's definitely a backward compatibility issue here of some sort, but how are you parsing this email? And does it work or fail in some other way on 3.3 tip?
msg207032 - (view) Author: Florian Apolloner (apollo13) Date: 2013-12-28 15:44
Yes, it works on python3.3 (from debian); I am parsing directly via email.message_from_string: Python 3.3.3 (default, Dec 8 2013, 14:51:59) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import email >>> msg="""Content-Type: text/plain; charset="utf-8" ... MIME-Version: 1.0 ... Content-Transfer-Encoding: 8bit ... Subject: =?utf-8?q?Ch=C3=A8re_maman?= ... From: from@example.com ... To: to@example.com ... Date: Sat, 28 Dec 2013 13:08:07 -0000 ... Message-ID: <20131228130807.3669.79195@localhost> ... ... Je t'aime très fort""" >>> email.message_from_string(msg) <email.message.Message object at 0x7fcfbcbf9090>
msg207033 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-12-28 15:47
Nevermind, I failed to notice the message_from_string part of the traceback. Different question: what are you doing with the message after you parse it? It is not an RFC valid message if you parse it from a string, so the only way to make it produce an RFC valid output is if you emit it as a string *and* encode the output to utf-8. I'll have to think about how this "should" work...a clearer error message may be the answer, but if so I suppose I'll need an actual deprecation period before shipping the charset fix for set_payload.
msg210514 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-07 18:37
This check has been reverted in issue 20531.
History
Date User Action Args
2022-04-11 14:57:56 admin set github: 64288
2014-02-07 18:37:51 r.david.murray set status: open -> closedsuperseder: TypeError in e-mail.parser when non-ASCII is presentmessages: + type: behaviorresolution: fixedstage: resolved
2013-12-28 15:47:35 r.david.murray set messages: +
2013-12-28 15:44:40 apollo13 set messages: +
2013-12-28 15:43:15 r.david.murray set messages: +
2013-12-28 14:20:57 apollo13 set nosy: + barry, r.david.murraycomponents: + emailversions: + Python 3.4
2013-12-28 14:09:03 apollo13 create