[Python-Dev] bytes.from_hex() (original) (raw)

Michael Urman murman at gmail.com
Thu Mar 2 04:25:20 CET 2006


[My apologies Greg; I meant to send this to the whole list. I really need a list-reply button in GMail. ]

On 3/1/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

I don't like that, because it creates a dependency (conceptually, at least) between the bytes type and the unicode type.

I only find half of this bothersome. The unicode type has a pretty clear dependency on the bytestring type: all I/O needs to be done in bytes. Various APIs may mask this by accepting unicode values and transparently doing the right thing, but from the theoretical standpoint we pretend there is no simple serialization of unicode values. But the reverse is not true: the bytestring type has no dependency on unicode.

As a practicality vs purity, however, I think it's a good choice to let the bytestring type have a tie to unicode, much like the str type implicitly does now. But you're absolutely right that adding a .tounicode begs the question why not a .tointeger?

To try to step back and summarize the viewpoints I've seen so far, there are three main requirements.

  1. We want things that are conceptually text to be stored in memory as unicode values.
  2. We want there to be some unambiguous conversion via codecs between bytestrings and unicode values. This should help teaching, learning, and remembering unicode.
  3. We want a way to apply and reverse compressions, encodings, encryptions, etc., which are not only between bytestrings and unicode values; they may be between any two arbitrary types. This allows writing practical programs.

There seems to be little disagreement over 1, provided sufficiently efficient implementation, or sufficient string powers in the bytestring type. To satisfy both 2 and 3, there seem to be a couple options. What other requirements do we have?

For (2): a) Restrict the existing helpers to be only bytestring.decode and unicode.encode, possibly enforcing output types of the opposite kind, and removing bytestring.encode b) Add new methods with these semantics, e.g. bytestring.udecode and unicode.uencode

For (3): c) Create new helpers codecs.encode(obj, encoding, errors) and codecs.decode(obj, encoding, errors) d) [Keep existing bytestring and unicode helper methods as is, and] require use of codecs.getencoder() and codecs.getdecoder() for arbitrary starting object types

Obviously 2a and 3d do not work together, but 2b and 3c work with either complementary option. What other options do we have?

Michael

Michael Urman http://www.tortall.net/mu/blog



More information about the Python-Dev mailing list