[Python-Dev] bytes.from_hex() (original) (raw)

Josiah Carlson jcarlson at uci.edu
Mon Feb 20 05:28:41 CET 2006

Previous message: [Python-Dev] bytes.from_hex()
Next message: [Python-Dev] bytes.from_hex()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Stephen J. Turnbull" <stephen at xemacs.org> wrote:

>>>>> "Josiah" == Josiah Carlson <jcarlson at uci.edu> writes: Josiah> The question remains: is str.decode() returning a string Josiah> or unicode depending on the argument passed, when the Josiah> argument quite literally names the codec involved, Josiah> difficult to understand? I don't believe so; am I the Josiah> only one? Do you do any of the user education about codec use that you recommend? The people I try to teach about coding invariably find it difficult to understand. The problem is that the near-universal intuition is that for "human-usable text" is pretty much anything *but Unicode* will do. This is a really hard block to get them past. There is very good reason why Unicode is plain text ("original" in MAL's terms) and everything else is encoded ("derived"), but students new to the concept often take a while to "get" it.

I've not been teaching Python; when I was still a TA, it was strictly algorithms and data structures. Of those people who I have had the opportunity to entice into Python, I've not followed up on their progress to know if they had any issues.

I try to internalize it by not thinking of strings as encoded data, but as binary data, and unicode as text. I then remind myself that unicode isn't native on-disk or cross-network (which stores and transports bytes, not characters), so one needs to encode it as binary data. It's a subtle difference, but it has worked so far for me.

In my experience, at least for only-English speaking users, most people don't even get to unicode. I didn't even touch it until I had been well versed with the encoding and decoding of all different kinds of binary data, when a half-dozen international users (China, Japan, Russia, ...) requested its support in my source editor; so I added it. Supporting it properly hasn't been very difficult, and the only real nit I have experienced is supporting the encoding line just after the #! line for arbitrary codecs (sometimes saving a file in a particular encoding dies).

I notice that you seem to be in Japan, so teaching unicode is a must. If you are using the "unicode is text" and "strings are data", and they aren't getting it; then I don't know.

Maybe it's just me, but whether it's the teacher or the students, I am not excited about the education route. Martin's simple rule is simple, and the exceptions for using a "nonexistent" method mean I don't have to reinforce---the students will be able to teach each other. The exceptions also directly help reinforce the notion that text == Unicode.

Are you sure that they would help? If .encode() and .decode() drop from strings and unicode (respectively), they get an AttributeError. That's almost useless. Raising a better exception (with more information) would be better in that case, but losing the functionality that either would offer seems unnecessary; which is why I had suggested some of the other method names. Perhaps a "This method was removed because it confused users. Use help(str.encode) (or unicode.decode) to find out how you can do the equivalent, or do what you really wanted to do."

I grant the point that .decode('base64') is useful, but I also believe that "education" is a lot more easily said than done in this case.

What I meant by "education" is 'better documentation' and 'better exception messages'. I didn't learn Python by sitting in a class; I learned it by going through the tutorial over a weekend as a 2nd year undergrad and writing software which could do what I wanted/needed. Compared to the compiler messages I'd been seeing from Codewarrior and MSVC 6, Python exceptions were like an oracle. I can understand how first-time programmers can have issues with some Python exception messages, which is why I think that we could use better ones. There is also the other issue that sometimes people fail to actually read the messages.

Again, I don't believe that an AttributeError is any better than an "ordinal not in range(128)", but "You are trying to encode/decode to/from incompatible types. expected: a->b got: x->y" is better. Some of those can be done very soon, given the capabilities of the encodings module, and they could likely be easily migrated, regardless of the decisions with .encode()/.decode() .

Josiah

Previous message: [Python-Dev] bytes.from_hex()
Next message: [Python-Dev] bytes.from_hex()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list