[Python-Dev] bytes (original) (raw)

[Python-Dev] bytes / unicode

Nick Coghlan ncoghlan at gmail.com
Thu Jun 24 17:25:18 CEST 2010


On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum <guido at python.org> wrote:

Also, IMO a polymorphic function should not accept mixed bytes/text input -- join('x', b'y') should be rejected. But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me.

A policy of allowing arguments to be either str or bytes, but not a mixture, actually avoids one of the more painful aspects of the 2.x "promote mixed operations to unicode" approach. Specifically, you either had to scan all the arguments up front to check for unicode, or else you had to stop what you were doing and start again with the unicode version if you encountered unicode partway through. Neither was particularly nice to implement.

As you noted elsewhere, literals and string methods are still likely to be a major sticking point with that approach - common operations like ''.join(seq) and b''.join(seq) aren't polymorphic, so functions that use them won't be polymorphic either. (It's only the str->unicode promotion behaviour in 2.x that works around this problem there).

Would it be heretical to suggest that sum() be allowed to work on strings to at least eliminate ''.join() as something that breaks bytes processing? It already works for bytes, although it then fails with a confusing message for bytearray:

sum(b"a b c".split(), b'') b'abc'

sum(bytearray(b"a b c").split(), bytearray(b'')) Traceback (most recent call last): File "", line 1, in TypeError: sum() can't sum bytes [use b''.join(seq) instead]

sum("a b c".split(), '') Traceback (most recent call last): File "", line 1, in TypeError: sum() can't sum strings [use ''.join(seq) instead]

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list