word like "I.B.M.". > > I did introduce a proposal in March for considering the status of some > word characters, which turned into a discussion into the UTC of > whether to add certain items to the identifier definition. > > http://www.unicode.org/L2/L2005/05083-wordprops.txt > > (I'll copy that section here for those without access: > > 0027 ; # Po APOSTROPHE > 002D ; # Pd HYPHEN-MINUS > 002E ; # Po FULL STOP > 003A ; # Po COLON > 00B7 ; # Po MIDDLE DOT > 058A ; # Pd ARMENIAN HYPHEN > 05F3 ; # Po HEBREW PUNCTUATION GERESH > 05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM > 200C ; # Cf ZERO WIDTH NON-JOINER // for Indic? > 200D ; # Cf ZERO WIDTH JOINER // for Indic? > 2010 ; # HYPHEN > 2019 ; # Pf RIGHT SINGLE QUOTATION MARK > 2027 ; # Po HYPHENATION POINT > 30A0 ; # Pd KATAKANA-HIRAGANA DOUBLE HYPHEN > > > The UTC decided that against adding them to the identifier definition. > If we were to change that for the Hebrew punctuation, we would have to > see a documented case for it. > > Mark >">

Hebrew script in IDN (was Exemplar Characters) (original) (raw)

Next message: Andreas Prilop: "Re: ISO 15924: Different Arabic scripts?"


Mark Davis wrote:
> It is not that clear-cut. Identifiers by their nature cannot include
> all words and phrases valid in all languages. For IDN, for example,
> one can't express the perfectly reasonable English word "can't", or a
> word like "I.B.M.".
>
> I did introduce a proposal in March for considering the status of some
> word characters, which turned into a discussion into the UTC of
> whether to add certain items to the identifier definition.
>
> http://www.unicode.org/L2/L2005/05083-wordprops.txt
>
> (I'll copy that section here for those without access:
>
> 0027 ; # Po APOSTROPHE
> 002D ; # Pd HYPHEN-MINUS
> 002E ; # Po FULL STOP
> 003A ; # Po COLON
> 00B7 ; # Po MIDDLE DOT
> 058A ; # Pd ARMENIAN HYPHEN
> 05F3 ; # Po HEBREW PUNCTUATION GERESH
> 05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM
> 200C ; # Cf ZERO WIDTH NON-JOINER // for Indic?
> 200D ; # Cf ZERO WIDTH JOINER // for Indic?
> 2010 ; # HYPHEN
> 2019 ; # Pf RIGHT SINGLE QUOTATION MARK
> 2027 ; # Po HYPHENATION POINT
> 30A0 ; # Pd KATAKANA-HIRAGANA DOUBLE HYPHEN
>
>
> The UTC decided that against adding them to the identifier definition.
> If we were to change that for the Hebrew punctuation, we would have to
> see a documented case for it.
>
> Mark
>

Mark,

I think you might meet some opposition to including the following in IDNs:

APOSTROPHE (?protocol character)
FULL STOP (it's a label separator: so no chance for use in IDN labels)
COLON (a definite protocol character in URLs)
ZWNJ and ZWJ (unless Indic experts can make a _very_ good case for these
being used only in contexts where they cause _visible_ and _unambiguous_
rendering changes)
RIGHT SINGLE QUOTATION MARK (spoof of APOSTROPHE)
HYPHENATION POINT (spoof of MIDDLE DOT)
KATAKANA-HIRAGANA DOUBLE HYPHEN (spoof of EQUALS SIGN, ?protocol character)

which leaves only

00B7 ; # Po MIDDLE DOT
058A ; # Pd ARMENIAN HYPHEN
05F3 ; # Po HEBREW PUNCTUATION GERESH
05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM

as characters which I would consider possible uncontroversial candidates
for IDN.

-- Neil



This archive was generated by hypermail 2.1.5: Fri Nov 18 2005 - 08:19:58 CST