Character usage (original) (raw)
Find non-ASCII characters used by a given language, or languages that use a given non-ASCII character. While largely correct, the information may not all be 100% reliable. Please read the notes below. Click on green characters to learn what they are.
Usage notes
Updated Sun 21 Jul 2019 • tags counterstyles, scriptnotes, apps
This page allows you to track the correspondences between languages and non-ASCII Unicode characters. It isn't claimed that this list is exhaustively correct, so you should treat it as an approximation. Much of the initial information was derived from CLDR and/or the Unicode site's UDHR transcripts. However, languages that cite GitHub in the list of sources are generally more reliable, since they are based on research.
To note:
- ASCII characters are ignored.
- Characters listed after a + sign are infrequently used.
- Where CLDR and UDHR are the primary sources, auxiliary characters from CLDR are shown as 'infrequent'. Every character that appears in a UDHR transcription is shown.
- Characters shown for a language include all characters produced by applying uppercase, lowercase, NFC, and NFD to the set of characters attributed to that language.
- As mentioned above, the data is expected to be largely correct, but not 100%. Note that CLDR source data is often not completely correct. In particular, data that is based on UDHR alone may be missing characters, just because they don't occur in that text (especially for scripts with a large syllabic repertoire). So the data should be treated with care. However, the data should be mostly correct, and I intend to fix it where errors come to light.
- The Native speakers row or column indicates the estimated number of native speakers for all the languages listed, in order to give a rough idea of the prevalence of that character. It doesn't represent the number of people who speak it as a second language, and often that is a multiple of the native speaker total. However, this number also represents speakers rather than literate users, so they are potential users of the character. Depending on the language, therefore, the figures may be low or at least conservative for speakers of many languages, and possibly high for speakers of some languages (typically small languages, or when using an alternate orthography).
- Chinese languages, Japanese, and Korean are not listed.
Tips:
- Mouse over the characters displayed to see their Unicode code point value and name. The icon allows you to view the characters on that row in a variety of other apps. The icon copies the characters on that row to the clipboard.
- If you don't have fonts for all the characters displayed, click on Convert to images. You can also change the font to one you have on your system, using the Change font to: control.
- The line that starts with Non-ASCII character count allows you to copy or share a list of all characters other than those that are infrequent or deprecated. Click on the relevant icon.
- The control Find by typing allows you to type in a name, or part of a name, of a language in order to find an option. Select the language you want from the suggestions offered. To see all options, just empty the box. (In Firefox you'll need to hit return again after selecting an item from the list of alternatives.)
- When adding characters to the Look up characters field, you can add Unicode code point numbers with space to either side, or escapes. For example, for આ any of the following escapes will work:
આ \u0A86 \u{A86} \0A86 U+0A86 0xA86
. No extra space is needed between escapes, and supplementary characters work too. - After you have generated a list of languages that use a given character, if you click on a language name then details for that language will be displayed above.
- To compare lists of characters, copy one set to the left box under Compare lists, and the other to the right box, then click on Compare. If both boxes are identical there will be no output, but if there are differences they will be displayed below the boxes.
- You can automatically display data via the URL. For example, try the following:
https://r12a.github.io/app-charuse/?language=vi
https://r12a.github.io/app-charuse/?charlist=đỹã
https://r12a.github.io/app-charuse/?script=devanagari
Sources:
- CLDR: https://www.unicode.org/cldr/charts/latest/summary/root.html
- UDHR: http://unicode.org/udhr/translations.html
- GitHub: mainly the articles under the heading "Script Notes" at https://r12a.github.io/scripts/index.html#scriptnotes.
To do:
- Add a graphic to show the number of speakers using a rectangle that grows with population.
- Show character names in HTML rather than tooltips when doing mouseover?
- Add symbols in the relevant block that are not included in the list? (Useful for checking the data.)
- Allow multiple regions per language for things like English, Spanish, Portuguese, etc.?