[Python-Dev] PEP 461 updates (original) (raw)
Stephen J. Turnbull stephen at xemacs.org
Fri Jan 17 03:19:44 CET 2014
- Previous message: [Python-Dev] PEP 461 updates
- Next message: [Python-Dev] PEP 461 updates
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Meta enough that I'll take Guido out of the CC.
Nick Coghlan writes:
There are plenty of data formats (like SMTP and HTTP) that are constrained to be ASCII compatible,
"ASCII compatible" is a technical term in encodings, which means "bytes in the range 0-127 always have ASCII coded character semantics, do what you like with bytes in the range 128-255."[1]
Worse, it's clearly confusing in this discussion. Let's stop using this term to mean
the data format has elements that are defined to contain only
bytes with ASCII coded character semantics
(which is the relevant restriction AFAICS -- I don't know of any ASCII-compatible formats where the bytes 128-255 are used for any purpose other than encoding non-ASCII characters). OTOH, if it is an ASCII-compatible text encoding, the semantics are dubious if the bytes versions of many of these methods/operations are used.
A documentation suggestion: It's easy enough to rewrite
constrained to be ASCII compatible, either globally, or locally in the parts being manipulated by an application (such as a file header). ASCII incompatible segments may be present, but in ways that allow the data processing to handle them correctly.
as
containing 'well-defined segments constrained to be (strictly)
ASCII-encoded' (aka ASCII segments).
And then you can say
<specified bytes methods> are designed for use *only* on bytes
that are ASCII segments; use on other data is likely to cause
hard-to-diagnose corruption.
If there are other use cases for "ASCII-compatible data formats" as defined above (not worrying about codecs, because they are a very small minority of code-to-be-written at this point), I don't know about them. Does anyone? If there are any, I'll be happy to revise. If not, that seems to be a precise and intelligible statement of the restrictions that is useful to the practical use cases. And nothing stops users who think they know what they're doing from using them in other contexts (which can be documented if they turn out to be broadly useful).
Footnotes: [1] "ASCII coded character semantics" is of course mildly ambiguous due to considerations like EOL conventions. But "you know what I'm talking about".
- Previous message: [Python-Dev] PEP 461 updates
- Next message: [Python-Dev] PEP 461 updates
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]