[Python-Dev] bytes (original) (raw)

[Python-Dev] bytes / unicode

Stephen J. Turnbull stephen at xemacs.org
Sun Jun 27 16:03:06 CEST 2010


P.J. Eby writes:

At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote:

What I'm saying here is that if bytes are the signal of validity, and the stdlib functions preserve validity, then it's better to have the stdlib functions object to unicode data as an argument. Compare the alternative: it returns a unicode object which might get passed around for a while before one of your functions receives it and identifies it as unvalidated data.

I still don't follow,

OK, I give up, since it was your use case that concerned me. I obviously misunderstood. Sorry for the confusion.

Sign me,
+1 on polymorphic functions in Tsukuba Japan

In general this is a hard problem, though. Polymorphism, OK, one-way tainting OK, but in general combining related types is pretty arbitrary, and as in the encoded-bytes case, the result type often varies depending on expectations of callers, not the types of the data.

But the caller can enforce those expectations by passing in arguments whose types do what they want in such cases, as long as the string literals used by the function don't get to override the relevant parts of the string protocol(s).

This simply isn't true for encoded bytes as proposed. For encoded text, the current encoding has no deterministic relationship to the desired encoding (at the level of generality of the stdlib; of course in specific applications it may be mandated by a standard or private convention).

I will have to pass on your other user-defined string types. I've never tried to implement one. I only wanted to point out that a user-controllable tainted string type would be preferable to confounding "unicode" with "tainted".



More information about the Python-Dev mailing list