[Python-Dev] email package status in 3.X (original) (raw)

Michael Urman murman at gmail.com
Tue Jun 22 15:24:28 CEST 2010

Previous message: [Python-Dev] email package status in 3.X
Next message: [Python-Dev] email package status in 3.X
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Michael Urman writes:

> It is somewhat troublesome that there doesn't appear to be an obvious > built-in idempotent-when-possible function that gives back the > provided bytes/str, If you want something idempotent, it's already the case that bytes(b'abc') => b'abc'. What might be desirable is to make bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII (or maybe ISO 8859/1).

By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, errors) that would pass an instance of bytes through, or encode an instance of str. And of course a to_str that performs similarly, passing str through and decoding bytes. While bytes(b'abc') will give me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me the b'abc' I want to see.

These are trivial functions; I just don't fully understand why the capability isn't baked in. A one argument call is idempotent capable; a two argument call isn't as it only converts.

It's not a completely made-up requirement either. A cross-platform piece of software may need to present to a user items that are sometimes str and sometimes bytes - particularly filenames.

Unfortunately, str(b'abc') already does work, but

steve at uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.

str(b'abc') "b'abc'"

Oops. You can see why that probably "should" be the case

Sure, and I love having this there for debugging. But this is hardly good enough for presenting to a user once you leave ascii.

u = '日本語' sjis = bytes(u, 'shift-jis') utf8 = bytes(u, 'utf-8') str(sjis), str(utf8) ("b'\x93\xfa\x96{\x8c\xea'", "b'\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e'")

When I happen to know the encoding, I can reverse it much more cleanly.

str(sjis, 'shift-jis'), str(utf8, 'utf-8') ('日本語', '日本語')

But I can't mix this approach with str instances without writing a different invocation.

str(u, 'argh') TypeError: decoding str is not supported

-- Michael Urman

Previous message: [Python-Dev] email package status in 3.X
Next message: [Python-Dev] email package status in 3.X
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list