Character Sets and Encodings - The Java EE 6 Tutorial (original) (raw)
2. Using the Tutorial Examples
3. Getting Started with Web Applications
4. JavaServer Faces Technology
7. Using JavaServer Faces Technology in Web Pages
8. Using Converters, Listeners, and Validators
9. Developing with JavaServer Faces Technology
10. JavaServer Faces Technology: Advanced Concepts
11. Using Ajax with JavaServer Faces Technology
12. Composite Components: Advanced Topics and Example
13. Creating Custom UI Components and Other Custom Objects
14. Configuring JavaServer Faces Applications
16. Uploading Files with Java Servlet Technology
17. Internationalizing and Localizing Web Applications
Java Platform Localization Classes
Providing Localized Messages and Labels
18. Introduction to Web Services
19. Building Web Services with JAX-WS
20. Building RESTful Web Services with JAX-RS
21. JAX-RS: Advanced Topics and Example
23. Getting Started with Enterprise Beans
24. Running the Enterprise Bean Examples
25. A Message-Driven Bean Example
26. Using the Embedded Enterprise Bean Container
27. Using Asynchronous Method Invocation in Session Beans
Part V Contexts and Dependency Injection for the Java EE Platform
28. Introduction to Contexts and Dependency Injection for the Java EE Platform
29. Running the Basic Contexts and Dependency Injection Examples
30. Contexts and Dependency Injection for the Java EE Platform: Advanced Topics
31. Running the Advanced Contexts and Dependency Injection Examples
32. Introduction to the Java Persistence API
33. Running the Persistence Examples
34. The Java Persistence Query Language
35. Using the Criteria API to Create Queries
36. Creating and Using String-Based Criteria Queries
37. Controlling Concurrent Access to Entity Data with Locking
38. Using a Second-Level Cache with Java Persistence API Applications
39. Introduction to Security in the Java EE Platform
40. Getting Started Securing Web Applications
41. Getting Started Securing Enterprise Applications
42. Java EE Security: Advanced Topics
Part VIII Java EE Supporting Technologies
43. Introduction to Java EE Supporting Technologies
45. Resources and Resource Adapters
46. The Resource Adapter Example
47. Java Message Service Concepts
48. Java Message Service Examples
49. Bean Validation: Advanced Topics
50. Using Java EE Interceptors
51. Duke's Bookstore Case Study Example
52. Duke's Tutoring Case Study Example
53. Duke's Forest Case Study Example
The following sections describe character sets and character encodings.
Character Sets
A character set is a set of textual and graphic symbols, each of which is mapped to a set of nonnegative integers.
The first character set used in computing was US-ASCII. It is limited in that it can represent only American English. US-ASCII contains uppercase and lowercase Latin letters, numerals, punctuation, control codes, and a few miscellaneous symbols.
Unicode defines a standardized, universal character set that can be extended to accommodate additions. When the Java program source file encoding doesn’t support Unicode, you can represent Unicode characters as escape sequences by using the notation \u_XXXX_, where XXXX is the character’s 16-bit representation in hexadecimal. For example, the Spanish version of the Duke’s Tutoring message file uses Unicode for non-ASCII characters:
nav.main=P\u00e1gina Principal nav.status=Mirar el estado nav.current_session=Ver sesi\u00f3n actual del tutorial nav.park=Ver estudiantes en el Parque nav.admin=Administraci\u00f3n
admin.nav.main=P\u00e1gina principal de administraci\u00f3n admin.nav.create_student=Crear un nuevo estudiante admin.nav.edit_student=Editar informaci\u00f3n del estudiante admin.nav.create_guardian=Crear un nuevo guardia admin.nav.edit_guardian=Editar guardia admin.nav.create_address=Crear una nueva direcci\u00f3n admin.nav.edit_address=Editar direcci\u00f3n admin.nav.activate_student=Activar estudiante
Character Encoding
A character encoding maps a character set to units of a specific width and defines byte serialization and ordering rules. Many character sets have more than one encoding. For example, Java programs can represent Japanese character sets using the EUC-JPor Shift-JIS encodings, among others. Each encoding has rules for representing and serializing a character set.
The ISO 8859 series defines 13 character encodings that can represent texts in dozens of languages. Each ISO 8859 character encoding can have up to 256 characters. ISO-8859-1 (Latin-1) comprises the ASCII character set, characters with diacritics (accents, diaereses, cedillas, circumflexes, and so on), and additional symbols.
UTF-8 (Unicode Transformation Format, 8-bit form) is a variable-width character encoding that encodes 16-bit Unicode characters as one to four bytes. A byte in UTF-8 is equivalent to 7-bit ASCII if its high-order bit is zero; otherwise, the character comprises a variable number of bytes.
UTF-8 is compatible with the majority of existing web content and provides access to the Unicode character set. Current versions of browsers and email clients support UTF-8. In addition, many new web standards specify UTF-8 as their character encoding. For example, UTF-8 is one of the two required encodings for XML documents (the other is UTF-16).
Web components usually use PrintWriter to produce responses; PrintWriter automatically encodes using ISO-8859-1. Servlets can also output binary data using OutputStream classes, which perform no encoding. An application that uses a character set that cannot use the default encoding must explicitly set a different encoding.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Legal Notices