[Python-Dev] PEP 460: allowing %d and %f and mojibake (original) (raw)

Ethan Furman ethan at stoneleaf.us
Mon Jan 13 06:22:04 CET 2014

Previous message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Next message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 01/12/2014 08:27 PM, Stephen J. Turnbull wrote:

Ethan Furman writes:

On 01/12/2014 02:57 PM, Stephen J. Turnbull wrote:

I didn't trim enough to make my point clear. My apologies.

But knowledge of ASCII isn't necessary to specify these methods; they can be defined in an encoding/decoding-free way.

Perhaps you meant "use the methods". I meant "write the methods".

You cannot write .upper for the bytes type without knowing what encoding has been used / is represented by those bytes. And quite frankly, if you use those methods on bytes without knowing (1) which encoding is represented by the bytes and (2) that the function you are calling is meant to work with that encoding... well, you deserve what you get.

How can you say that with a straight face? Because I showed you code that does it. Did you see an .encode or a .decode in there?

No, I didn't. I saw numbers representing bytes representing text that has been encoded in the ASCII codec. If you didn't know it was ASCII, you couldn't write that function. Even though you don't have to call encode or decode if working directly with encoded bytes, you still have to know what the encoding is to do it correctly.

Do you really think that .title, .isalnum, and .center (to name only a few) would work the same if the assumed encoding was EBCIDC?

I phrased that poorly. If the byte stream was EBCIDC-encoded, and we called the current .method_which_assumes_ASCII on it, would we get the proper results?

The numbers involved would change, and the test for finding letters would be different (and more complicated IIRC).

And you have actually just made my point. If the bytes in question were EBCIDC-encoded, we could write a function for it because we know what it looks like as encoded bytes. Then we could be debating the merits of working directly with EBCIDC-encoded text instead of ASCII-encoded text. ;)

"There should be one- and preferably only one -way to do it." The one way uses text, so preferably bytes shouldn't.

You forgot the word "obvious".

-- ~~Ethan~~

Previous message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Next message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list