Issue 1102973: Incorrect RFC 2231 decoding (original) (raw)
The following message
---snip snip--- Content-Transfer-Encoding: base64 Content-Type: application/msword; x-mac-type=42494E41; x-unix-mode=0644; x-mac-creator=4D535744; name="miriam's file.doc" Content-Disposition: attachment; filename0="miriam's file"; filename1=ths.doc ---snip snip---
Is incorrectly decoded according to RFC 2231. The bug is in Utils.py in the decode_params() and decode_rfc2231() functions. charset/lang encoding should only be present on the first parameter, i.e. filename*0 and even then, the quoted single quote should not trip up the scanner. The problem is two fold:
- first, the unquoting of filename*0 happens in decode_param(), too early for decode_rfc2231() to know about it. Second, the logic in docode_rfc2231() is too naive; it should take quoting into account to decide whether a single quote is part of the file name or part of the leading charset/lang encoding.
I've labeled this in the Group: Python 2.4, but it really affects Python 2.3 as well, and the current head.