Issue 1414018: email.Utils.py: UnicodeError in RFC2322 header (original) (raw)

Issue1414018

Created on 2006-01-24 20:19 by qbin, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
conversion_error.eml qbin,2006-01-24 20:19 sample email
Messages (2)
msg27352 - (view) Author: A. Sagawa (qbin) Date: 2006-01-24 20:19
Description: collapse_rfc2231_value does not handle UnicodeError exception. Therefore a header like this one can cause UnicodeError in attempting unicode conversion. --- Content-Type: text/plain; charset="ISO-2022-JP" Content-Disposition: attachment; filename*=iso-2022-jp''%1B%24BJs9p%3Dq%2D%21%1B%28B%2Etxt --- Test script: --- #! /usr/bin/env python import sys import email msg = email.message_from_file(sys.stdin) for part in msg.walk(): print part.get_params() print part.get_filename() --- run % env LANG=ja_JP.eucJP ./test.py < attached_sample.eml Background: Character 0x2d21 is invalid in JIS X0208 but defined in CP932 (Shift_JIS's superset by Microsoft). Conversion between Shift_JIS and ISO-2022-JP are computable because both of them based on JIS X0208. So sometimes CP932 characters appear in ISO-2022-JP encoded string, typically produced by Windows MUA. But Python's "ISO-2022-JP" means *pure* JIS X0208, thus conversion is failed. Workaround: Convert to fallback_charset and/or skip invalid character.
msg27353 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2006-07-28 03:19
Logged In: YES user_id=12800 r50894 for Python 2.4/email 3.0. This is already fixed in Python 2.5/email 4.0
History
Date User Action Args
2022-04-11 14:56:15 admin set github: 42834
2006-01-24 20:19:55 qbin create