[Python-Dev] bytes.from_hex() (original) (raw)
Stephen J. Turnbull stephen at xemacs.org
Sat Feb 25 19:05:38 CET 2006
- Previous message: [Python-Dev] bytes.from_hex()
- Next message: [Python-Dev] bytes.from_hex()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Greg" == Greg Ewing <greg.ewing at canterbury.ac.nz> writes:
Greg> Stephen J. Turnbull wrote:
>> the kind of "text" for which Unicode was designed is normally
>> produced and consumed by people, who wll pt up w/ ll knds f
>> nnsns. Base64 decoders will not put up with the same kinds of
>> nonsense that people will.
Greg> The Python compiler won't put up with that sort of nonsense
Greg> either. Would you consider that makes Python source code
Greg> binary data rather than text, and that it's inappropriate to
Greg> represent it using a unicode string?
The reason that Python source code is text is that the primary producers/consumers of Python source code are human beings, not compilers.
There are no such human producers/consumers of base64. Unless you prefer that I expressed that last sentence as "VGhlIHJlYXNvbiB0aG F0IFB5dGhvbiBzb3VyY2UgY29kZSBpcyB0ZXh0IGlzIGJlY2F1c2UgdGhlIHByaW1 hcnkKcHJvZHVjZXJzL2NvbnN1bWVycyBvZiBQeXRob24gc291cmNlIGNvZGUgYXJl IGh1bWFuIGJlaW5ncywgbm90CmNvbXBpbGVycy4="?
>> You're basically assuming that the person who implements the
>> code that processes a Unicode string is the same person who
>> implemented the code that converts a binary object into base64
>> and inserts it into a string.
Greg> No, I'm assuming the user of base64 knows the
Greg> characteristics of the channel he's using.
Yes, which implies that you assume he has control of the data all the way to the channel that actually requires base64.
Use case: the Gnus MUA supports the RFC that allows non-ASCII names in MIME headers that take file names. The interface was written for message-at-a-time use, which makes sense for composition. Somebody else added "save and strip part" editing capability, but this only works one MIME part at a time. So if you have a message with four MIME parts and you save and strip all of them, the first one gets encoded four times.
The reason for this bug, and scores like it over the years, is that somebody made it convenient to put wire protocols into a text document. Shouldn't Python do better than that? Shouldn't Python text be for humans, rather than be whatever had the tag "character" attached to it for convenience of definition of a protocol for communication of data humans can't process without mechanical assistance?
>> I don't think it's a good idea to gratuitously introduce wire
>> protocols as unicode codecs,
Greg> I am *not* saying that base64 is a unicode codec! If that's
Greg> what you thought I was saying, it's no wonder we're
Greg> confusing each other.
I know you don't think that it's a duck, but it waddles and quacks. Ie, the question is not what I think you're saying. It's "what is the Python compiler/interpreter going to think?" AFAICS, it's going to think that base64 is a unicode codec.
Greg> The only time I need to use something like base64 is when I
Greg> have something that will only accept text. In Py3k, "accepts
Greg> text" is going to mean "takes a character string as input",
Characters are inherently abstract, as a class they can't be instantiated as input or output---only derived (ie, encoded) characters can. I don't believe that "takes a character string as input" has any intrinsic meaning.
Greg> Does that make it clearer what I'm getting at?
No. I already understood what you're getting at. As I said, I'm sympathetic in principle. In practice, I think it's a loaded gun aimed at my foot. And yours.
-- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- Previous message: [Python-Dev] bytes.from_hex()
- Next message: [Python-Dev] bytes.from_hex()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]