Issue 16983: header parsing could apply postel's law to encoded words inside quotes (original) (raw)
It has come to my attention that at least some mail agents apply postel's law to addresses like the following:
From: "=?utf-8?Q?not_really_valid?=" <foo@example.com>
Since encountering something that looks like an encoded word but that is not is a very unlikely occurrence, we could go ahead and decode such strings, resulting in
"not really valid" <foo@example.com>
a defect would be registered, and some sort of 'strict' policy mode could refuse to do the decode (as well as several other non-compliant patterns, such as encoded words not separated by whitespace). I think the decoding should be the default, though.
This applies also to other headers where words or phrases can be quoted, such as in filenames. I have also encountered the quoted-encoded-word-as-filename in the wild.
The old header parsing code already decodes these, although it gets the spacing wrong if you do the standard str(make_header(decode_header(x))) dance. The fix for the new header parsing code only handles the specific case of only encoded words surrounded by double quotes. That's the only variation I've seen in the wild so far, so I think that may be enough. To extend it to handle mixed regular text and encoded words would require rewriting the qcontent and ptext functions. Possible, but not worth it unless a real use case turns up. (Although, I think there might be a bug in quoted text parsing that may make that rewrite worthwhile later; but it is only a bug if you are actually walking the parse tree, it is not a functional bug.)
Oh, and I decided to treat this as a bug fix, not an enhancement, because the old parser code already did this decoding.