VISCII (original) (raw)
From Wikipedia, the free encyclopedia
Unofficial character encoding for the Vietnamese alphabet
Not to be confused with VSCII (Vietnamese Standard Code for Information Interchange), a family of official encodings for Vietnamese.
VISCII
MIME / IANA | VISCII |
---|---|
Language(s) | Vietnamese, English |
Created by | Viet-Std Group |
Definitions | RFC 1456 |
Classification | 8-bit SBCS |
Based on | ASCII |
vte |
VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128 precomposed characters. Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data,[_citation needed_] but legacy VSCII and VISCII files may need conversion.
VISCII was designed by the Vietnamese Standardization Working Group (Viet-Std Group)[1] led by Christopher Cuong T. Nguyen, Cuong M. Bui, and Hoc D. Ngo based in Silicon Valley, California in 1992 while they were working with the Unicode consortium to include pre-composed Vietnamese characters in the Unicode standard. VISCII, along with VIQR, was first published in a bilingual report in September 1992, in which it was dubbed the "Vietnamese Standard Code for Information Interchange".[2] The report noted a proliferation in computer usage in Vietnam and the increasing volume of computer-based communications among Vietnamese abroad, that existing applications used vendor-specific encodings which were unable to interoperate with one another, and that standardisation between vendors was therefore necessary. The successful inclusion of composed and precomposed Vietnamese in Unicode 1.0 was the result of the lessons learned from the development of 8-bit VISCII and 7-bit VIQR.[2]
The next year, in 1993, Vietnam adopted TCVN 5712, its first national standard in the information technology domain.[3] This defined a character encoding named VSCII, which had been developed by the TCVN Technical Committee on Information Technology (TCVN/TC1), and with its name standing for "Vietnamese Standard Code for Information Interchange".[3] VSCII is incompatible with, and otherwise unrelated to, the earlier-published VISCII.[4] Unlike VISCII, VSCII is a "Vietnamese Standard" in the sense of a national standard.
VISCII and VIQR were approved as the informational-status RFC 1456, attributed to the Viet-Std group and dated May 1993. As is the case with IETF RFCs, RFC 1456 notes them to be "conventions" used by overseas Vietnamese speakers on Usenet, and that it "specifies no level of standard". In spite of this, it continues to call VISCII the "VIetnamese Standard Code for Information Interchange" (the same name taken by VSCII).[5] The labels VISCII
and csVISCII
are registered with the IANA for VISCII, with reference to RFC 1456.[6] (There is, on the other hand, no official IANA label for TCVN 5712 / VSCII, although x-viet-tcvn5712
was previously supported by Mozilla Firefox.[7])
A traditional extended ASCII character set consists of the ASCII set plus up to 128 characters. Vietnamese requires 134 additional letter-diacritic combinations, which is six too many. There are (short of dropping tone mark support for capital letters, as in VSCII-3) essentially four different ways to handle this problem:
- Use variable-width encoding (as does UTF-8)
- Include combining diacritical marks for tone marks (as do VSCII-2 and Windows-1258) or for diacritics in general (as do ANSEL and VNI)
- Replace some ASCII punctuation, preferably punctuation which is not invariant in ISO 646 (as does VNI for DOS)
- Replace at least six of the basic ASCII control characters (as do VPS and VSCII-1)
VISCII went for the last option, replacing six of the least problematic (e.g., least likely to be recognised by an application and acted on specially) C0 control codes (STX, ENQ, ACK, DC4, EM, and RS) with six of the least-used uppercase letter-diacritic combinations.[2] While this option may cause programs that use those control codes to malfunction when handling VISCII text, it creates fewer complications than the other two options (the designers note that non-8-bit clean transmission had been found to pose more difficulty in practice than the control character re-use).[2] Nonetheless, locations of both C0 or C1 control characters and the codes used for the non-breaking space in ISO-8859-1, Mac OS Roman and OEM-US were deliberately assigned to uppercase letters, with the intention of making use of lowercase codepoints with an all-capital font a serviceable workaround if graphical characters could not be displayed for those codes.[2]
However, using up all the extended code points for accented letters left no room to add useful symbols, superscripted numbers, curved quotes, proper dashes, etc., like most other extended ASCII character sets.
Location of characters deliberately mostly follows ISO-8859-1 where there are characters in common between the two code pages (the uppercase Õ being noted as an exception), motivated by user friendliness concerns.[2]
VISCII is partially supported by the TriChlor Software Group in California, which has released various VISCII-compliant software packages, libraries, and fonts for MS-DOS and Windows, Unix, and Macintosh. VISCII-compliant software is available at many FTP sites.
VISCII was historically offered as an encoding for outgoing email by Mozilla Thunderbird.[8] It was also supported by the Windows Vietnamese keyboard software, WinVNKey, created by Christopher Cuong T. Nguyen and later upgraded through various Windows versions by Hoc D. Ngo and others.
VISCII was mostly used by overseas Vietnamese speakers, with VSCII (TCVN) being more popular in northern Vietnam and VNI being more popular in southern Vietnam.[9]
VISCII
| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | | | ---- | ---------------------------------------------------- | ---------------------------------------------------- | ---------------------------------------------------- | ---------------------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------- | ------------------------------------------------------------------------ | -------------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------- | ---------------------------------------------------------- | | 0x | NUL | SOH | Ẳ1EB2 | ETX | EOT | Ẵ1EB4 | Ẫ1EAA | BEL | BS | HT | LF | VT | FF | CR | SO | SI | | 1x | DLE | DC1 | DC2 | DC3 | Ỷ1EF6 | NAK | SYN | ETB | CAN | Ỹ1EF8 | SUB | ESC | FS | GS | Ỵ1EF4 | US | | 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | | 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? | | 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | | 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ | | 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | | 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL | | 8x | Ạ1EA0 | Ắ1EAE | Ằ1EB0 | Ặ1EB6 | Ấ1EA4 | Ầ1EA6 | Ẩ1EA8 | Ậ1EAC | Ẽ1EBC | Ẹ1EB8 | Ế1EBE | Ề1EC0 | Ể1EC2 | Ễ1EC4 | Ệ1EC6 | Ố1ED0 | | 9x | Ồ1ED2 | Ổ1ED4 | Ỗ1ED6 | Ộ1ED8 | Ợ1EE2 | Ớ1EDA | Ờ1EDC | Ở1EDE | Ị1ECA | Ỏ1ECE | Ọ1ECC | Ỉ1EC8 | Ủ1EE6 | Ũ0168 | Ụ1EE4 | Ỳ1EF2 | | Ax | Õ00D5 | ắ1EAF | ằ1EB1 | ặ1EB7 | ấ1EA5 | ầ1EA7 | ẩ1EA9 | ậ1EAD | ẽ1EBD | ẹ1EB9 | ế1EBF | ề1EC1 | ể1EC3 | ễ1EC5 | ệ1EC7 | ố1ED1 | | Bx | ồ1ED3 | ổ1ED5 | ỗ1ED7 | Ỡ1EE0 | Ơ01A0 | ộ1ED9 | ờ1EDD | ở1EDF | ị1ECB | Ự1EF0 | Ứ1EE8 | Ừ1EEA | Ử1EEC | ơ01A1 | ớ1EDB | Ư01AF | | Cx | À | Á | Â | Ã | Ả1EA2 | Ă0102 | ẳ1EB3 | ẵ1EB5 | È | É | Ê | Ẻ1EBA | Ì | Í | Ĩ0128 | ỳ1EF3 | | Dx | Đ0110 | ứ1EE9 | Ò | Ó | Ô | ạ1EA1 | ỷ1EF7 | ừ1EEB | ử1EED | Ù | Ú | ỹ1EF9 | ỵ1EF5 | Ý | ỡ1EE1 | ư01B0 | | Ex | à | á | â | ã | ả1EA3 | ă0103 | ữ1EEF | ẫ1EAB | è | é | ê | ẻ1EBB | ì | í | ĩ0129 | ỉ1EC9 | | Fx | đ0111 | ự1EF1 | ò | ó | ô | õ | ỏ1ECF | ọ1ECD | ụ1EE5 | ù | ú | ũ0169 | ủ1EE7 | ý | ợ1EE3 | Ữ1EEE |
- ASCII
- Vietnamese Quoted-Readable (VIQR)
- Vietnamese Standard Code for Information Interchange (VSCII)
- Windows-1258
- ^ Phung, Quang; Ngo, Hoc D.; Bui, Cuong. "Vietnamese-Standard Working Group Home Page". Viet-Std Group. Retrieved 2019-08-23.
- ^ a b c d e f Vietnamese Character Encoding Standardization Report - VISCII And VIQR 1.1 Character Encoding Specifications (Technical report). Viet-Std Group. 1992.
- ^ a b "[news] TCVN 5712:1993 (VSCII) -- Vietnamese national standard". 1993-06-02. Archived from the original on 2017-01-11.
- ^ Lunde, Ken (13 January 2009). "Chapter 1: CJKV Information Processing Overview (§ Are VISCII and VSCII identical? What about TCVN?)". CJKV Information Processing (2nd ed.). p. 17. ISBN 978-0-596-51447-1.
- ^ Vietnamese Standardization Working Group (May 1993). Conventions for Encoding the Vietnamese Language. IETF. doi:10.17487/RFC1456. RFC 1456.
- ^ "Character Sets". IANA.
- ^ Sivonen, Henri (2014-09-26). "Character encoding changes in m-c require c-c action". mozilla.dev.apps.thunderbird.
- ^ Sivonen, Henri (2014-09-26). "Character encoding changes in m-c require c-c action". mozilla.dev.apps.thunderbird. VISCII and armscii-8 are special in the sense that, for long time, Thunderbird itself (misguidedly) provided these encodings in the user interface for the choice of outgoing character encoding when composing a message. Therefore, it is possible that there exists a Thunderbird-created legacy of VISCII and armscii-8 email and Usenet posts.
- ^ Ngo, Hoc Dinh; Tran, TuBinh. "5. Why Having Vietnamese Charset (Character Set – Encoding) Conversion?". Some special functions of WinVNKey.
- Flohr, Guido (2016) [2006]. "Locale::RecodeData::VISCII - Conversion routines for VISCII". CPAN libintl-perl. Archived from the original on 2017-01-14. Retrieved 2017-01-14.
- https://www.math.nmsu.edu/~mleisher/Software/csets/VISCII.TXT
- RFC 1456 - Conventions for Encoding the Vietnamese Language
- Vietnamese-Standardization Working Group based in California
- Viet-Std Report 1992
- AnGiang Software
- VISCII-compliant software and fonts for MS-DOS and Windows
- VISCII-compliant software, libraries, and fonts for Unix
- WinVNKey, Vietnamese keyboard driver for Windows supporting multinational character sets, including VISCII
- MacVNKey, VISCII-compliant keyboard driver for Macintosh classic