[Python-Dev] bytes (original) (raw)
[Python-Dev] bytes / unicode
P.J. Eby pje at telecommunity.com
Sat Jun 26 20:17:44 CEST 2010
- Previous message: [Python-Dev] bytes / unicode
- Next message: [Python-Dev] bytes / unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote:
What I'm saying here is that if bytes are the signal of validity, and the stdlib functions preserve validity, then it's better to have the stdlib functions object to unicode data as an argument. Compare the alternative: it returns a unicode object which might get passed around for a while before one of your functions receives it and identifies it as unvalidated data.
I still don't follow, since passing in bytes should return bytes. Returning unicode would be an error, in the case of a "polymorphic" function (per Guido).
But you agree that there are better mechanisms for validation (although not available in Python yet), so I don't see this as an potential obstacle to polymorphism now.
Nope. I'm just saying that, given two bytestrings to url-join or path join or whatever, a polymorph should hand back a bytestring. This seems pretty uncontroversial.
> What I want is for the stdlib to create stringlike objects of a > type determined by the types of the inputs --
In general this is a hard problem, though. Polymorphism, OK, one-way tainting OK, but in general combining related types is pretty arbitrary, and as in the encoded-bytes case, the result type often varies depending on expectations of callers, not the types of the data.
But the caller can enforce those expectations by passing in arguments whose types do what they want in such cases, as long as the string literals used by the function don't get to override the relevant parts of the string protocol(s).
The idea that I'm proposing is that the basic string and byte types should defer to "user-defined" string types for mixed type operations, so that polymorphism of string-manipulation functions is the default case, rather than a special case. This makes tainting easier to implement, as well as optimizing and other special cases (like my "source string w/file and line info", or a string with font/formatting attributes).
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com
- Previous message: [Python-Dev] bytes / unicode
- Next message: [Python-Dev] bytes / unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]