[Python-3000] PEP 3131 - the details (original) (raw)

James Y Knight foom at fuhm.net
Thu May 17 07:50:17 CEST 2007


On May 16, 2007, at 10:30 PM, Talin wrote:

While there has been a lot of discussion as to whether to accept PEP 3131 as a whole, there has been little discussion as to the specific details of the PEP. In particular, is it generally agreed that the Unicode character classes listed in the PEP are the ones we want to include in identifiers?

One issue I see is that the PEP defines ID_Start and ID_Continue
itself. It should not do that, bue instead reference as authoritative
the unicode properties ID_Start and ID_Continue defined in the
unicode property database.

ID_Start is officially: Lu+Ll+Lt+Lm+Lo+Nl+Other_ID_Start and ID_Continue is officially: ID_Start + Mn+Mc+Nd+Pc +
Other_ID_Continue

The only differences between PEP 3131's definition and the official
ones is the Other_* bits. Those are there to ensure the requirement
that anything now in ID_Start/ID_Continue will always in the future
be in said categories. That is an important feature, and should not
be overlooked. Without the supplemental list, a future version of
unicode which changes the general class of a character could make a
previously valid identifier become invalid. The list currently
includes the following entries:

2118 ; Other_ID_Start # So SCRIPT CAPITAL P 212E ; Other_ID_Start # So ESTIMATED SYMBOL 309B..309C ; Other_ID_Start # Sk [2] KATAKANA-HIRAGANA VOICED
SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK 1369..1371 ; Other_ID_Continue # No [9] ETHIOPIC DIGIT
ONE..ETHIOPIC DIGIT NINE

This list is available as part of the PropList.txt file in the
unicode data, which ought to be included automatically in python's
unicode database so as to get future changes.

My preference is to be conservative in terms of what's allowed.

I do not believe it is a good idea for python to define its own
identifier rules. The rules defined in UAX31 make sense and should be
used directly, with only the minor amendment of _ as an allowable
start character.

James



More information about the Python-3000 mailing list