[Python-Dev] accept string in a2b and base64? (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed Feb 22 08:37:55 CET 2012

Previous message: [Python-Dev] accept string in a2b and base64?
Next message: [Python-Dev] accept string in a2b and base64?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

R. David Murray writes:

If most people agree with Antoine I won't fight it, but it seems to me that accepting unicode in the binascii and base64 APIs is a bad idea.

First, I agree with David that this change should have been brought up on python-dev before committing it. The distinctions Python 3 has made between APIs for bytes and those for str are both obviously controversial and genuinely delicate.

Second, if Unicode is to be accepted in these APIs, there is a doc issue (which I haven't checked). It must be made clear that the "printable ASCII" is question is the set represented by the integers 33 to 126, not the ASCII characters ! to ~. Those characters are present in the Unicode repertoire in many other places (specifically the "full-width ASCII" compatibility character set around U+FF20, but also several Greek and Cyrillic characters, and possibly others.)

I'm going to side with Antoine and Nick on these particular changes because in practice (except maybe in the email module :-( ) the BASE-encoded "text" to be decoded is going to be consistently defined by the client as either str or bytes, but not both. The fact that the repr of the encoded text is identical (except for the presence or absence of a leading "b") is very suggestive here. I do harbor a slight niggle that I think there is more room for confusion here than in Nick's urllib work.

However, once we clarify that confusion in our minds, I don't think there's much potential for dangerous confusion for API clients. (I agree with Antoine on that point.) The BASE## decoding APIs in abstract are "text" to bytes. Pedantically in Python that suggests a str -> bytes signature, but RFC 4648 doesn't anywhere require a 1-byte representation of ASCII, only that the representation be interpreted as integers in the ASCII coding. However, an RFC-4648-conforming implementation MUST reject any string containing characters not allowed in the representation, so it's actually stricter than requiring ASCII. I see no problem with allowing str-or-bytes -> bytes polymorphism here.

The remaining issue to my mind is we'd also like bytes -> str-or-bytes polymorphism for symmetry, but this is not Haskell, we can't have it.

The same is true for binascii, I suppose -- assuming that the module is specified (as the name suggests) to produce and consume only ASCII text as a representation of bytes.

Previous message: [Python-Dev] accept string in a2b and base64?
Next message: [Python-Dev] accept string in a2b and base64?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list