[Python-Dev] PEP 393 Summer of Code Project (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Fri Aug 26 11:29:55 CEST 2011


IronPython and Jython can retain UTF-16 as their native form if that makes interop cleaner, but in doing so they need to ensure that basic operations like indexing and len work in terms of code points, not code units, if they are to conform.

That means that they won't conform, period. There is no efficient maintainable implementation strategy to achieve that property, and it may take well years until somebody provides an efficient unmaintainable implementation.

Does this make sense, or have I completely misunderstood things?

You seem to assume it is ok for Jython/IronPython to provide indexing in O(n). It is not.

However, non-conformance may not be that much of an issue. They do not conform in many other aspects, either (such as not supporting Python 3, for example, or not supporting the C API) that they may well chose to ignore such a minor requirement if there was one. For BMP strings, they conform fine, and it may well be that Jython eithers either don't have non-BMP strings, or don't care whether len() or indexing of their non-BMP strings is "correct".

Regards, Martin



More information about the Python-Dev mailing list