> We certainly should NOT mix the primary collating order (A to Z for both > English, German, Italian, and French), which is what is used for > directory, > with those exemplar characters we are talking about here.">

Unicode Mail List Archive: Re: Exemplar Characters (original) (raw)

Next message: Otto Stolz: "Re: Exemplar Characters"


From: "Antoine Leca" <Antoine10646@leca-marti.org>
> We certainly should NOT mix the primary collating order (A to Z for both
> English, German, Italian, and French), which is what is used for
> directory,
> with those exemplar characters we are talking about here.

Certainly! But even for a single language with the appropriate tailored
collation, the primary order does not tell isolately whever two letters are
the same or not. They are different as soon as they have another difference
than just the tailored case folding (generally this is the unification
secondary collation level, but this depends on how a collation is tailored
for a language, which may have additional collation levels with higher
priorty than the case difference, and some languages may consider case
difference as meaning distinct letters, so that conversion to uppercase or
lowercase or titlecase would change the meaning and orthographic rules).

I know that collation can be tailored for a given language, but can a locale
specify which collation level is used for case difference, or if case
difference is significant? May be I have read that in the past, but I can't
remember exactly. If a locale specifies this, then we have a good way to
determine which letters or polygraphs should be listed as distinct and
necessary (or recommanded) in examplar and auxiliary characters: effectively
we can then list only lowercase letters for most languages (because
uppercase letters are case folded to the the same class in that language),
and we can elimininate from those lists all variants that collate equally
(except for the last code point level).

So examplar and auxiliary characters could be checked automatically and
simplified: the other differences are either case difference, or code point
differences.

For Breton, it would be sufficient to tailor for example the collation table
so that the ASCII quote and the right curly quote collate together; in that
case we only need to list in examplar characters the recommanded trigraph {c'h}
and not {c'h} that collate equivalently, and not {C'h} which onky has a case
difference ignorable for examplar characters. This solves the ambiguity and
simplifies the problem.



This archive was generated by hypermail 2.1.5: Thu Nov 17 2005 - 06:19:23 CST