Supported Encodings (original) (raw)

| | | |

The classes java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of the Java platform. The class description for java.nio.charset.Charset lists the encodings that any implementation of the Java Platform, Standard Edition 6 is required to support.

Sun's Java SE Development Kit 6 for all platforms (SolarisTM operating environment, Linux, and Microsoft Windows) and the Java SE Runtime Environment 6 for Solaris and Linux support all encodings shown on this page. Sun's Java SE Runtime Environment 6 for Windows may be installed as a complete international version or as a European languages version. The JRE installer by default installs a European languages version if it recognizes that the host operating system only supports European languages. If the installer recognizes that any other language is needed, or if the user requests support for non-European languages in a customized installation, a complete international version is installed. The European languages version only supports the encodings shown in the first table. The international version (which includes the lib/charsets.jarfile) supports all encodings shown on this page.

The following tables show the encoding sets supported by Java SE 6. The canonical names used by the new java.nio APIs are in many cases not the same as those used in the java.io and java.lang APIs.

Basic Encoding Set (contained in lib/rt.jar)

Canonical Name for java.nio API Canonical Name for java.io and java.lang API Description
IBM00858 Cp858 Variant of Cp850 with Euro character
IBM437 Cp437 MS-DOS United States, Australia, New Zealand, South Africa
IBM775 Cp775 PC Baltic
IBM850 Cp850 MS-DOS Latin-1
IBM852 Cp852 MS-DOS Latin-2
IBM855 Cp855 IBM Cyrillic
IBM857 Cp857 IBM Turkish
IBM862 Cp862 PC Hebrew
IBM866 Cp866 MS-DOS Russian
ISO-8859-1 ISO8859_1 ISO-8859-1, Latin Alphabet No. 1
ISO-8859-2 ISO8859_2 Latin Alphabet No. 2
ISO-8859-4 ISO8859_4 Latin Alphabet No. 4
ISO-8859-5 ISO8859_5 Latin/Cyrillic Alphabet
ISO-8859-7 ISO8859_7 Latin/Greek Alphabet (ISO-8859-7:2003)
ISO-8859-9 ISO8859_9 Latin Alphabet No. 5
ISO-8859-13 ISO8859_13 Latin Alphabet No. 7
ISO-8859-15 ISO8859_15 Latin Alphabet No. 9
KOI8-R KOI8_R KOI8-R, Russian
KOI8-U KOI8_U KOI8-U, Ukrainian
US-ASCII ASCII American Standard Code for Information Interchange
UTF-8 UTF8 Eight-bit Unicode (or UCS) Transformation Format
UTF-16 UTF-16 Sixteen-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-16BE UnicodeBigUnmarked Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-16LE UnicodeLittleUnmarked Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order
UTF-32 UTF_32 32-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-32BE UTF_32BE 32-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-32LE UTF_32LE 32-bit Unicode (or UCS) Transformation Format, little-endian byte order
x-UTF-32BE-BOM UTF_32BE_BOM 32-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark
x-UTF-32LE-BOM UTF_32LE_BOM 32-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark
windows-1250 Cp1250 Windows Eastern European
windows-1251 Cp1251 Windows Cyrillic
windows-1252 Cp1252 Windows Latin-1
windows-1253 Cp1253 Windows Greek
windows-1254 Cp1254 Windows Turkish
windows-1257 Cp1257 Windows Baltic
Not available UnicodeBig Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark
x-IBM737 Cp737 PC Greek
x-IBM874 Cp874 IBM Thai
x-UTF-16LE-BOM UnicodeLittle Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark

Extended Encoding Set (contained in lib/charsets.jar)

Canonical Name for java.nio API Canonical Name for java.io and java.lang API Description
Big5 Big5 Big5, Traditional Chinese
Big5-HKSCS Big5_HKSCS Big5 with Hong Kong extensions, Traditional Chinese (incorporating 2001 revision)
EUC-JP EUC_JP JISX 0201, 0208 and 0212, EUC encoding Japanese
EUC-KR EUC_KR KS C 5601, EUC encoding, Korean
GB18030 GB18030 Simplified Chinese, PRC standard
GB2312 EUC_CN GB2312, EUC encoding, Simplified Chinese
GBK GBK GBK, Simplified Chinese
IBM-Thai Cp838 IBM Thailand extended SBCS
IBM01140 Cp1140 Variant of Cp037 with Euro character
IBM01141 Cp1141 Variant of Cp273 with Euro character
IBM01142 Cp1142 Variant of Cp277 with Euro character
IBM01143 Cp1143 Variant of Cp278 with Euro character
IBM01144 Cp1144 Variant of Cp280 with Euro character
IBM01145 Cp1145 Variant of Cp284 with Euro character
IBM01146 Cp1146 Variant of Cp285 with Euro character
IBM01147 Cp1147 Variant of Cp297 with Euro character
IBM01148 Cp1148 Variant of Cp500 with Euro character
IBM01149 Cp1149 Variant of Cp871 with Euro character
IBM037 Cp037 USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia
IBM1026 Cp1026 IBM Latin-5, Turkey
IBM1047 Cp1047 Latin-1 character set for EBCDIC hosts
IBM273 Cp273 IBM Austria, Germany
IBM277 Cp277 IBM Denmark, Norway
IBM278 Cp278 IBM Finland, Sweden
IBM280 Cp280 IBM Italy
IBM284 Cp284 IBM Catalan/Spain, Spanish Latin America
IBM285 Cp285 IBM United Kingdom, Ireland
IBM297 Cp297 IBM France
IBM420 Cp420 IBM Arabic
IBM424 Cp424 IBM Hebrew
IBM500 Cp500 EBCDIC 500V1
IBM860 Cp860 MS-DOS Portuguese
IBM861 Cp861 MS-DOS Icelandic
IBM863 Cp863 MS-DOS Canadian French
IBM864 Cp864 PC Arabic
IBM865 Cp865 MS-DOS Nordic
IBM868 Cp868 MS-DOS Pakistan
IBM869 Cp869 IBM Modern Greek
IBM870 Cp870 IBM Multilingual Latin-2
IBM871 Cp871 IBM Iceland
IBM918 Cp918 IBM Pakistan (Urdu)
ISO-2022-CN ISO2022CN GB2312 and CNS11643 in ISO 2022 CN form, Simplified and Traditional Chinese (conversion to Unicode only)
ISO-2022-JP ISO2022JP JIS X 0201, 0208, in ISO 2022 form, Japanese
ISO-2022-KR ISO2022KR ISO 2022 KR, Korean
ISO-8859-3 ISO8859_3 Latin Alphabet No. 3
ISO-8859-6 ISO8859_6 Latin/Arabic Alphabet
ISO-8859-8 ISO8859_8 Latin/Hebrew Alphabet
JIS_X0201 JIS_X0201 JIS X 0201
JIS_X0212-1990 JIS_X0212-1990 JIS X 0212
Shift_JIS SJIS Shift-JIS, Japanese
TIS-620 TIS620 TIS620, Thai
windows-1255 Cp1255 Windows Hebrew
windows-1256 Cp1256 Windows Arabic
windows-1258 Cp1258 Windows Vietnamese
windows-31j MS932 Windows Japanese
x-Big5_Solaris Big5_Solaris Big5 with seven additional Hanzi ideograph character mappings for the Solaris zh_TW.BIG5 locale
x-euc-jp-linux EUC_JP_LINUX JISX 0201, 0208, EUC encoding Japanese
x-EUC-TW EUC_TW CNS11643 (Plane 1-7,15), EUC encoding, Traditional Chinese
x-eucJP-Open EUC_JP_Solaris JISX 0201, 0208, 0212, EUC encoding Japanese
x-IBM1006 Cp1006 IBM AIX Pakistan (Urdu)
x-IBM1025 Cp1025 IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR)
x-IBM1046 Cp1046 IBM Arabic - Windows
x-IBM1097 Cp1097 IBM Iran (Farsi)/Persian
x-IBM1098 Cp1098 IBM Iran (Farsi)/Persian (PC)
x-IBM1112 Cp1112 IBM Latvia, Lithuania
x-IBM1122 Cp1122 IBM Estonia
x-IBM1123 Cp1123 IBM Ukraine
x-IBM1124 Cp1124 IBM AIX Ukraine
x-IBM1381 Cp1381 IBM OS/2, DOS People's Republic of China (PRC)
x-IBM1383 Cp1383 IBM AIX People's Republic of China (PRC)
x-IBM33722 Cp33722 IBM-eucJP - Japanese (superset of 5050)
x-IBM834 Cp834 IBM EBCDIC DBCS-only Korean
x-IBM856 Cp856 IBM Hebrew
x-IBM875 Cp875 IBM Greek
x-IBM921 Cp921 IBM Latvia, Lithuania (AIX, DOS)
x-IBM922 Cp922 IBM Estonia (AIX, DOS)
x-IBM930 Cp930 Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
x-IBM933 Cp933 Korean Mixed with 1880 UDC, superset of 5029
x-IBM935 Cp935 Simplified Chinese Host mixed with 1880 UDC, superset of 5031
x-IBM937 Cp937 Traditional Chinese Host miexed with 6204 UDC, superset of 5033
x-IBM939 Cp939 Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
x-IBM942 Cp942 IBM OS/2 Japanese, superset of Cp932
x-IBM942C Cp942C Variant of Cp942
x-IBM943 Cp943 IBM OS/2 Japanese, superset of Cp932 and Shift-JIS
x-IBM943C Cp943C Variant of Cp943
x-IBM948 Cp948 OS/2 Chinese (Taiwan) superset of 938
x-IBM949 Cp949 PC Korean
x-IBM949C Cp949C Variant of Cp949
x-IBM950 Cp950 PC Chinese (Hong Kong, Taiwan)
x-IBM964 Cp964 AIX Chinese (Taiwan)
x-IBM970 Cp970 AIX Korean
x-ISCII91 ISCII91 ISCII91 encoding of Indic scripts
x-ISO2022-CN-CNS ISO2022_CN_CNS CNS11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only)
x-ISO2022-CN-GB ISO2022_CN_GB GB2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only)
x-iso-8859-11 x-iso-8859-11 Latin/Thai Alphabet
x-JIS0208 x-JIS0208 JIS X 0208
x-JISAutoDetect JISAutoDetect Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only)
x-Johab x-Johab Korean, Johab character set
x-MacArabic MacArabic Macintosh Arabic
x-MacCentralEurope MacCentralEurope Macintosh Latin-2
x-MacCroatian MacCroatian Macintosh Croatian
x-MacCyrillic MacCyrillic Macintosh Cyrillic
x-MacDingbat MacDingbat Macintosh Dingbat
x-MacGreek MacGreek Macintosh Greek
x-MacHebrew MacHebrew Macintosh Hebrew
x-MacIceland MacIceland Macintosh Iceland
x-MacRoman MacRoman Macintosh Roman
x-MacRomania MacRomania Macintosh Romania
x-MacSymbol MacSymbol Macintosh Symbol
x-MacThai MacThai Macintosh Thai
x-MacTurkish MacTurkish Macintosh Turkish
x-MacUkraine MacUkraine Macintosh Ukraine
x-MS950-HKSCS MS950_HKSCS Windows Traditional Chinese with Hong Kong extensions
x-mswin-936 MS936 Windows Simplified Chinese
x-PCK PCK Solaris version of Shift_JIS
x-windows-50220 Cp50220 Windows Codepage 50220 (7-bit implementation)
x-windows-50221 Cp50221 Windows Codepage 50221 (7-bit implementation)
x-windows-874 MS874 Windows Thai
x-windows-949 MS949 Windows Korean
x-windows-950 MS950 Windows Traditional Chinese
x-windows-iso2022jp x-windows-iso2022jp Variant ISO-2022-JP (MS932 based)
Oracle and/or its affiliates Java Technology Copyright © 1993, 2018, Oracle and/or its affiliates. All rights reserved. Contact Us