Unicode Utilities: Unicode Language Identifers and BCP47 (original) (raw)

Input

Localization:

Type	Code	Name	Replacement
Language	fr	French
Region	CA	Canada

Type	Code	Name
Language	gsw	Swiss German
Script	Arab	Arabic
Region	AQ	Antarctica

Canonical Form: en-Latn-US

Minimal Form: en

Type	Code	Name	Replacement
Language	eng	invalid code	en
Script	Latn	Latin
Region	840	invalid Code	US

Unicode language ids are based on BCP 47, but differ in a few ways.
The names are localized with Unicode CLDR data: names with '*' are fallbacks to English; names with '**' are fallbacks to the latest draft registry names.
Replacements are for invalid subtags (zho → zh, 248 → AX), or preferred replacements (iw → he), orpredominant languages (arb → ar).

Fonts and Display. If you don't have a good set of Unicode fonts (and modern browser), you may not be able to read some of the characters. Some suggested fonts that you can add for coverage are:Noto Fonts site,Unicode Fonts for Ancient Scripts,Large, multi-script Unicode fonts. See also: Unicode Display Problems.

Version 3.9; ICU version: 74.1; Unicode/Emoji version: 15.1.0;