(original) (raw)

On 2/15/06, Guido van Rossum <guido@python.org> wrote:


>  Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.
[...]
It would be better if the signature of text.encode() always returned a

bytes object. But why deny the bytes object a decode() method if text
objects have an encode() method?


I agree, text.encode() and bytes.decode() are both swell.  It's the
other two that bother me.


I'd say there are two "symmetric" API flavors possible (t and b are
text and bytes objects, respectively, where text is a string type,
either str or unicode; enc is an encoding name):

- b.decode(enc) -> t; t.encode(enc) -> b
- b = bytes(t, enc); t = text(b, enc)

I'm not sure why one flavor would be preferred over the other,
although having both would probably be a mistake.

I prefer constructor flavor; the word "bytes" feels more concrete than "encode".  But I worry about constructors being too overloaded.

>>> text(b, enc)  # decode
>>> text(mydict)  # repr
>>> text(b)       # uh... decode with default encoding?

\-j