[Python-Dev] Patch making the current email package (mostly) support bytes (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed Oct 6 05:22:18 CEST 2010


Nick Coghlan writes:

At what level, though?

To take an interesting example I used to see frequently:

From: taro at tokyo.jp (Taro Yamada in 8-bit Shift JIS)

So I guess you are suggesting that the email module can RFC 822 parse that, and

  1. Refuse to return the unwrapped (ie, single line) form of the whole field, except as bytes.
  2. Refuse to return the content of the From field, except as bytes.
  3. Return the email address parsed from the From field.
  4. Refuse to return the comment, except as bytes.

That's fine. But suppose I have a private or newly defined header that is structured? Now I have two choices:

  1. Write a version of my private parser for both str (the normal case) and bytes (if accessing the value as str raises)

  2. Always get the bytes and convert them to str (probably using the same .decode('ascii','surrogate-escape') call that email uses but won't let me have the value of!), then use a common str parser. Note that this is more problematic than it looks, since the appropriate base codec may require information from higher-level structures (eg, qp codec tags or a Content-Type header's charset field).

Why should I reproduce email's logic here? I don't care if the default or concise API raises on surrogates in the str value. But I'm pretty sure that I will want to use str values containing surrogates in these contexts (for the same reasons that email module does, for example), rather than work with bytes sometimes and strs sometimes.

Please provide a way to return strs-with-surrogates if I ask for them.



More information about the Python-Dev mailing list