Unicode Mail List Archive: Re: ISO 15924: zh-Hani for general Chinese (was: Different Arabic scripts?) (original) (raw)

Next message: Tom Emerson: "Re: ISO 15924: zh-Hani for general Chinese (was: Different Arabic scripts?)"


Getting a little off-topic for Unicode here...

Philippe Verdy wrote:

> In a locale, what differences does it make between "zh" (any Chinese
> language) and "zh-Hani" (any Han script) ? Except if one expects a
> difference for "zh-Latn" (Pinyin) or "zh-Bopo" (Bopomofo), it is
> unlikely that a resource localized for "zh" would use something else
> than a Han orthography, the alternatives being encoded separately for
> special local use.

You've just discovered the premise behind Suppress-Script, an attribute
of language subtags developed for the forthcoming RFC 3066bis.

Certain languages are written much more commonly with one particular
script than with any other, to the extent that specifying that script in
a language tag would be pointless. Examples might include French in
Latin script, Arabic in Arabic script, or Chinese in "Han" (simplified
vs. traditional unspecified). For some of those languages, the RFC
3066bis registry will include a Suppress-Script entry, indicating that
the use of that script subtag with that language subtag is discouraged,
though not forbidden.

For example:

Type: language
Subtag: fr
Description: French
Added: 2005-10-16
Suppress-Script: Latn

This entry means that in most circumstances, the tag "fr-Latn" conveys
no additional information over simply "fr", and therefore the script
subtag "Latn" should be suppressed. Languages such as Serbian, for
which there is no overwhelming "majority" script, don't have a
Suppress-Script entry; a script subtag usually does add some information
in these cases.

Not all languages that are predominantly written in a particular script
have been assigned a Suppress-Script entry. There are provisions in RFC
3066bis to register Suppress-Script information for additional
languages. The review process must be undertaken carefully to ensure
adequate expertise and a lack of political motivation (e.g. someone
trying to define Latin as the "default" script for Serbian)

The tags "zh-Hans" and "zh-Hant" certainly can add information as
compared to "zh" alone. Even "zh-Hani" might be an improvement over
"zh" in contexts where non-Han transcriptions (such as, but not limited
to, Pinyin) might be expected.

Transcription systems in general have been suggested as a reasonable use
for variant subtags in RFC 3066bis. It's not a good idea to infer a
particular transcription given only the script. "Korean in Latin
script," for example, could be McCune-Reischauer, Yale, Revised
Romanization, or even something else.

Further discussion on this topic should be carried out on
ietf-languages@iana.org, not on the Unicode list.

-- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/



This archive was generated by hypermail 2.1.5: Fri Nov 25 2005 - 19:36:04 CST