[Python-Dev] len(chr(i)) = 2? (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Fri Nov 26 04:02:09 CET 2010


M.-A. Lemburg writes:

Please note that we can only provide one way of string indexing in Python using the standard s[1] notation and since we don't want that operation to be fast and no more than O(1), using the code units as items is the only reasonable way to implement it.

AFAICT, the "we" that wants "no more than O(1)" does not include Glyph Lefkowitz, James Knight, and Greg Ewing. Greg even said that in designing a UTF-8 string type he might not provide a indexing operation at all. (Caution: That may not be what he meant; I'm just reporting the way I interpreted it.) Of course none of them are proposing to change Python, that's all in the context of designing a new language. But it does suggest that a lot of people can't think of use cases where O(1) string indexing is more important than Unicode robustness.

It is by far more important to maintain round-trip safety for Unicode data, than getting every bit of code work correctly with surrogates (often, there won't be a single correct way).

But surely it's more important than that to ensure that surrogates can't crash a Python process with unexpect UnicodeErrors?



More information about the Python-Dev mailing list