[Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?] (original) (raw)

M.-A. Lemburg mal at egenix.com
Sat Feb 18 13:21:18 CET 2006

Previous message: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]
Next message: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thomas Wouters wrote:

On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:

I've already explained why we have .encode() and .decode() methods on strings and Unicode many times. I've also explained the misunderstanding that can codecs only do Unicode-string conversions. And I've explained that the .encode() and .decode() method do check the return types of the codecs and only allow strings or Unicode on return (no lists, instances, tuples or anything else).

You seem to ignore this fact. Actually, I think the problem is that while we all agree the bytestring/unicode methods are a useful way to convert from bytestring to unicode and back again, we disagree on their general usefulness. Sure, the codecs mechanism is powerful, and even more so because they can determine their own returntype. But it still smells and feels like a Perl attitude, for the reasons already explained numerous times, as well:

It's by no means a Perl attitude.

The main reason is symmetry and the fact that strings and Unicode should be as similar as possible in order to simplify the task of moving from one to the other.

- The return value for the non-unicode encodings depends on the value of the encoding argument.

Not really: you'll always get a basestring instance.

- The general case, by and large, especially in non-powerusers, is to encode unicode to bytestrings and to decode bytestrings to unicode. And that is a hard enough task for many of the non-powerusers. Being able to use the encode/decode methods for other tasks isn't helping them.

Agreed.

Still, I believe that this is an educational problem. There are a couple of gotchas users will have to be aware of (and this is unrelated to the methods in question):

"encoding" always refers to transforming original data into a derived form
"decoding" always refers to transforming a derived form of data back into its original form
for Unicode codecs the original form is Unicode, the derived form is, in most cases, a string

As a result, if you want to use a Unicode codec such as utf-8, you encode Unicode into a utf-8 string and decode a utf-8 string into Unicode.

Encoding a string is only possible if the string itself is original data, e.g. some data that is supposed to be transformed into a base64 encoded form.

Decoding Unicode is only possible if the Unicode string itself represents a derived form, e.g. a sequence of hex literals.

That is why I disagree with the hypergeneralization of the encode/decode methods, regardless of the fact that it is a natural expansion of the implementation of codecs. Sure, it looks 'right' and 'natural' when you look at the implementation. It sure doesn't look natural, to me and to many others, when you look at the task of encoding and decoding bytestrings/unicode.

That's because you only look at one specific task.

Codecs also unify the various interfaces to common encodings such as base64, uu or zip which are not Unicode related.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Feb 18 2006)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

Previous message: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]
Next message: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list