[Python-3000] PEP 3131 accepted (original) (raw)
Ka-Ping Yee python at zesty.ca
Sun May 27 03:19:50 CEST 2007
- Previous message: [Python-3000] PEP 3131 accepted
- Next message: [Python-3000] PEP 3131 accepted
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, 26 May 2007, Michael Urman wrote:
On 5/26/07, Ka-Ping Yee <python at zesty.ca> wrote: > But the enabling of UTF-8 by a BOM at the > beginning of the file is an invisible override. This invisible > override is the source of the danger. If we want to be able to > read the coding declaration with any confidence, we should get rid > of the invisible override.
Do we need to reconsider PEP 3120 "Using UTF-8 as the default source encoding"? I don't see much difference between not knowing on visual inspection whether: allowed is allowed or "allowed" == "allowed"
The concern is similar in nature, but there is a difference. It is more feasible to tell programmers not to trust the visual appearance of strings than to tell them not to trust the visual appearance of identifiers. Strings are data, which makes them separable from the structure and logic of a program, whereas identifiers are fundamental to all programs. Programmers are already trained to understand that string literals in source code are non-verbatim representations (e.g. "it's" == 'it's' == 'it' "'s" == "\x69t's"), whereas they have a well established expectation that identifiers are written verbatim.
As long as you have a way of distinguishing strings reliably from the rest of the source code, you can know whether your confidence is well placed. Blake's example illustrates that ambiguity in strings is especially dangerous because it can obscure where strings begin and end.
PEP 3120 is problematic. At the very least, it is definitely missing a section addressing objections (the problem of not being able to understand an expression like "allowed" == "allowed") and a section on security considerations (like those raised by Blake's example).
Since that the default encoding is currently ASCII, almost all Python programmers are unlikely to be prepared for ambiguity in strings; thus the best thing to do would be to keep the default as ASCII and require a visible declaration to activate such ambiguity (enable UTF-8). Failing that, the next best thing to do would be to forbid all confusable characters without an explicit declaration to permit them. And the next best thing after that would be to forbid just the characters that are confusable with the delimiters that fence off ambiguous text (' " #) without an explicit declaration to permit them.
-- ?!ng
- Previous message: [Python-3000] PEP 3131 accepted
- Next message: [Python-3000] PEP 3131 accepted
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]