Issue 1078919: email.Header (via add_header) encodes non-ASCII content incorrectly (original) (raw)

Created on 2004-12-04 15:47 by tlau, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
add_header.patch	r.david.murray,2010-10-03 03:37	review

Messages (9)
msg23536 - (view)	Author: Tessa Lau (tlau)	Date: 2004-12-04 15:47
I'm generating a MIME message with an attachment whose filename includes non-ASCII characters. I create the MIME header as follows: msg.add_header('Content-Disposition', 'attachment', filename=u'Fu\xdfballer_sind_klug.ppt') The Python-generated header looks like this: Content-disposition: =?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9IkZ1w59iYWxsZXJf?= =?utf-8?q?sind=5Fklug=2Eppt=22?= I sent messages with this header to Gmail, evolution, and thunderbird, and none of them correctly decode that header to suggest the correct default filename. However, I've found that those three mailers do behave correctly when the header looks like this instead: Content-disposition: attachment; filename="=?iso-8859-1?q?Fu=DFballer=5Fsind=5Fklug=2Eppt?=" Is there a way to make Python's email module generate a Content-disposition header that works with common MUAs? I know I can manually encode the filename before passing it to add_header(), but it seems that Python should be doing this for me.
msg23537 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-12-05 19:42
Logged In: YES user_id=21627 The fact that neither Gmail, evolution, or thunderbird can decode this string properly does not mean that Python encodes it incorrectly. I cannot see an error in this header - although I can sympathize with the developers of the MUAs that this is a non-obvious usage of the standards. So I recommend you report this as a bug to the authors of the MUAs.
msg82125 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-02-14 22:21
The proposed output has the virtue of being easier to read.
msg117903 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-10-03 02:16
I don't believe either the example that other mailers reject or the one that they accept are in fact RFC compliant. Encoded words are not supposed to occur in (structured) MIME headers. The behavior observed is a consequence of all headers, whether structured or unstructured, being treated as if they were unstructured by Header. (There's a different problem in Python3 with this example, but I'll deal with that in a separate issue.) What we have here is primarily a documentation bug. The way to generate the correct (RFC compliant) header is as follows: >>> m.add_header('Content-Disposition', 'attachment', ... filename=('iso-8859-1', '', 'Fußballer_sind_klug.ppt')) >>> str(m) 'Content-Disposition: attachment; filename*="iso-8859-1\'\'Fu%DFballer_sind_klug.ppt"\n\n' I will add the explanation and this example to the docs. In addition, in 3.2 I will disallow non-ASCII parameter values unless they are specified in a three element tuple as in the example above. That will still leave some other places where structured headers are inappropriately encoded by Header (eg: addresses with non-ASCII names), but dealing with that is a somewhat deeper problem.
msg117905 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-10-03 03:37
Here is a patch.
msg117924 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-10-03 19:25
> In addition, in 3.2 I will disallow non-ASCII parameter values unless > they are specified in a three element tuple as in the example above. Why would the caller be required to choose an encoding while you could simply default to utf-8? There doesn't seem to be much value in forcing the use of e.g. iso-8859-15. Also, I'm not sure I understand what the goal of email6 is if you're breaking compatibility in email5 anyway :)
msg117935 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-10-04 00:07
The compatibility argument is a fair point, and yes we could default to utf8 and no language. So that is probably a better solution than raising the error.
msg117955 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2010-10-04 15:01
RDM, I wonder if it wouldn't be better (in email6) to use an instance to represent the 3-tuple instead? It might make for clearer client code, and would allow you to default things you might generally not care about. E.g. class NonASCIIParameter: # XXX come up with better name def __init__(self, text, charset='utf-8', language=''): It's unfortunate that you have to reorder the arguments from the 3-tuple form of (charset, language, text) but I think you could play games with keyword arguments to make them consistent. In general the patch looks fine to me, though I suggest splitting test_add_header() into separate tests for each of the three conditions you're testing there.
msg123912 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-12-14 00:30
Committed the default-to-utf8 fix in r87217, splitting up the tests as suggested by Barry. Backported to 3.1 in r87218. Updated the documentation for 2.7 in r87219.

History
Date	User	Action	Args
2022-04-11 14:56:08	admin	set	github: 41280
2010-12-27 17:04:58	r.david.murray	unlink	issue1685453 dependencies
2010-12-14 00:30:57	r.david.murray	set	status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2010-10-04 15:01:19	barry	set	messages: +
2010-10-04 00:07:45	r.david.murray	set	messages: +
2010-10-03 19:25:32	pitrou	set	nosy: + pitroumessages: +
2010-10-03 03:37:48	r.david.murray	set	files: + add_header.patchkeywords: + patchmessages: + stage: test needed -> patch review
2010-10-03 02:16:09	r.david.murray	set	type: enhancement -> behaviormessages: + title: Email.Header encodes non-ASCII content incorrectly -> email.Header (via add_header) encodes non-ASCII content incorrectly
2010-08-26 15:24:21	BreamoreBoy	set	versions: + Python 3.2, - Python 2.7
2010-05-05 13:43:52	barry	set	assignee: barry -> r.david.murraynosy: + r.david.murray
2009-04-22 16:03:19	ajaksu2	set	keywords: + easy
2009-03-30 22:56:23	ajaksu2	link	issue1685453 dependencies
2009-02-14 22:21:53	ajaksu2	set	nosy: + ajaksu2stage: test neededtype: enhancementmessages: + versions: + Python 2.7, - Python 2.4
2004-12-04 15:47:34	tlau	create