[Python-3000] PEP 3131 - the details (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Thu May 17 11:10:58 CEST 2007
- Previous message: [Python-3000] PEP 3131 - the details
- Next message: [Python-3000] PEP 3131 - the details
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
One issue I see is that the PEP defines IDStart and IDContinue itself. It should not do that, bue instead reference as authoritative the unicode properties IDStart and IDContinue defined in the unicode property database.
ID_Start and ID_Continue are derived non-mandatory properties, and I believe UAX#31 is the one defining these properties. So I thought I could just copy the definition.
Currently, the Python unicodedata module does not contain a definition for ID_Start and ID_Continue, so I could not use it in the PEP.
IDStart is officially: Lu+Ll+Lt+Lm+Lo+Nl+OtherIDStart and IDContinue is officially: IDStart + Mn+Mc+Nd+Pc + OtherIDContinue
I know see what 'stability extensions' are which are mentioned in the PEP (copied from UAX#31). Even though Python currently does not include Other_ID_Start and Other_ID_Continue, it could be made so in the parser.
It would have been nice if UAX#31 had mentioned that the "stability extensions" are recorded in these properties.
The only differences between PEP 3131's definition and the official ones is the Other* bits. Those are there to ensure the requirement that anything now in IDStart/IDContinue will always in the future be in said categories. That is an important feature, and should not be overlooked.
See the PEP: there was an XXX remark I still needed to resolve.
This list is available as part of the PropList.txt file in the unicode data, which ought to be included automatically in python's unicode database so as to get future changes.
This I'm not so sure about. I changed the PEP to say that Other_ID_{Start|Continue} should be included. Whether the other properties should be added to the unidata module, I don't know - I would like to see use cases first before including them.
I do not believe it is a good idea for python to define its own identifier rules. The rules defined in UAX31 make sense and should be used directly, with only the minor amendment of as an allowable start character.
That was my plan indeed.
Regards, Martin
- Previous message: [Python-3000] PEP 3131 - the details
- Next message: [Python-3000] PEP 3131 - the details
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]