[Python-Dev] PEP 393 Summer of Code Project (original) (raw)
fwierzbicki at gmail.com fwierzbicki at gmail.com
Fri Sep 9 18:12:38 CEST 2011
- Previous message: [Python-Dev] PEP 393 Summer of Code Project
- Next message: [Python-Dev] PEP 393 Summer of Code Project
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Sep 8, 2011 at 10:39 PM, Terry Reedy <tjreedy at udel.edu> wrote:
On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote:
Oops, forgot to add the link for the gory details for Java and> 2 byte unicode: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ This is dated 2004. Basically, they considered several options, tried out 4, and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit code unit and added 32-bit Character(int) class for low-level manipulation of code points. I did not see the indexing problem mentioned. I get the impression that they encourage sequence forward-backward iteration (cursor-based access) rather than random-access indexing. Hmmm, sorry for the irrelevant link - my lack of expertise here is showing. What I do know is that we (meaning Jim Baker) are taking great pains to always use codepoints even for random access in our unicode code. I can't speak to the performance implications without some deeper study into what Jim has done.
-Frank
- Previous message: [Python-Dev] PEP 393 Summer of Code Project
- Next message: [Python-Dev] PEP 393 Summer of Code Project
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]