[Python-Dev] bytes.from_hex() (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Mon Feb 27 06:59:44 CET 2006

Previous message: [Python-Dev] bytes.from_hex()
Next message: [Python-Dev] bytes.from_hex()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Greg" == Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

Greg> Stephen J. Turnbull wrote:

>> I gave you one, MIME processing in email

Greg> If implementing a mime packer is really the only use case
Greg> for base64, then it might as well be removed from the
Greg> standard library, since 99.99999% of all programmers will
Greg> never touch it.  I don't have any real-life use cases for
Greg> base64 that a non-mime-implementer might come across, so all
Greg> I can do is imagine what shape such a use case might have.

I guess we don't have much to talk about, then.

>> Give me a use case where it matters practically that the output
>> of the base64 codec be Python unicode characters rather than
>> 8-bit ASCII characters.

Greg> I'd be perfectly happy with ascii characters, but in Py3k,
Greg> the most natural place to keep ascii characters will be in
Greg> character strings, not byte arrays.

Natural != practical.

Anyway, I disagree, and I've lived with the problems that come with an environment that mixes objects with various underlying semantics into a single "text stream" for a decade and a half.

That doesn't make me authoritative, but as we agree to disagree, I hope you'll keep in mind that someone with real-world experience that is somewhat relevant[1] to the issue doesn't find that natural at all.

Greg> Since the Unicode character set is a superset of the ASCII
Greg> character set, it doesn't seem unreasonable that they could
Greg> also be thought of as Unicode characters.

I agree. However, as soon as I go past that intuition to thinking about what that implies for operations on the base64 string, it begins to seem unreasonable, unnatural, and downright dangerous. The base64 string is a representation of an object that doesn't have text semantics. Nor do base64 strings have text semantics: they can't even be concatenated as text (the pad character '=' is typically a syntax error in a profile of base64, except as terminal padding). So if you wish to concatenate the underlying objects, the base64 strings must be decoded, concatenated, and re-encoded in the general case. IMO it's not worth preserving the very superficial coincidence of "character representation" in the face of such semantics.

I think that fact that favoring the coincidence of representation leads you to also deprecate the very natural use of the codec API to implement and understand base64 is indicative of a deep problem with the idea of implementing base64 as bytes->unicode.

Footnotes: [1] That "somewhat" is intended literally; my specialty is working with codecs for humans in Emacs, but I've also worked with more abstract codecs such as base64 in contexts like email, in both LISP and Python.

-- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.

Previous message: [Python-Dev] bytes.from_hex()
Next message: [Python-Dev] bytes.from_hex()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list