Character entity references in HTML 4 (original) (raw)
Contents
- Introduction to character entity references
- Character entity references for ISO 8859-1 characters
- Character entity references for symbols, mathematical symbols, and Greek letters
- Character entity references for markup-significant and internationalization characters
24.1 Introduction to character entity references
A character entity reference is an SGML construct that references a character of the document character set.
This version of HTML supports several sets of character entity references:
- ISO 8859-1 (Latin-1) characters In accordance with section 14 of [RFC1866], the set of Latin-1 entities has been extended by this specification to cover the whole right part of ISO-8859-1 (all code positions with the high-order bit set), including the already commonly used , © and ®. The names of the entities are taken from the appendices of SGML (defined in [ISO8879]).
- symbols, mathematical symbols, and Greek letters. These characters may be represented by glyphs in the Adobe font "Symbol".
- markup-significant and internationalization characters(e.g., for bidirectional text).
The following sections present the complete lists of character entity references. Although, by convention, [ISO10646] the comments following each entry are usually written with uppercase letters, we have converted them to lowercase in this specification for reasons of readability.
24.2 Character entity references for ISO 8859-1 characters
The character entity references in this section produce characters whose numeric equivalents should already be supported by conforming HTML 2.0 user agents. Thus, the character entity reference ÷ is a more convenient form than ÷ for obtaining the division sign (÷).
To support these named entities, user agents need only recognize the entity names and convert them to characters that lie within the repertoire of [ISO88591].
Character 65533 (FFFD hexadecimal) is the last valid character in UCS-2. 65534 (FFFE hexadecimal) is unassigned and reserved as the byte-swapped version of ZERO WIDTH NON-BREAKING SPACE for byte-order detection purposes. 65535 (FFFF hexadecimal) is unassigned.
24.2.1 The list of characters
24.3 Character entity references for symbols, mathematical symbols, and Greek letters
The character entity references in this section produce characters that may be represented by glyphs in the widely available Adobe Symbol font, including Greek characters, various bracketing symbols, and a selection of mathematical operators such as gradient, product, and summation symbols.
To support these entities, user agents may support full [ISO10646] or use other means. Display of glyphs for these characters may be obtained by being able to display the relevant [ISO10646] characters or by other means, such as internally mapping the listed entities, numeric character references, and characters to the appropriate position in some font that contains the requisite glyphs.
When to use Greek entities. This entity set contains all the letters used in modern Greek. However, it does not include Greek punctuation, precomposed accented characters nor the non-spacing accents (tonos, dialytika) required to compose them. There are no archaic letters, Coptic-unique letters, or precomposed letters for Polytonic Greek. The entities defined here are not intended for the representation of modern Greek text and would not be an efficient representation; rather, they are intended for occasional Greek letters used in technical and mathematical works.
24.3.1 The list of characters
24.4 Character entity references for markup-significant and internationalization characters
The character entity references in this section are for escaping markup-significant characters (these are the same as those in HTML 2.0 and 3.2), for denoting spaces and dashes. Other characters in this section apply to internationalization issues such as the disambiguation of bidirectional text (see the section on bidirectional text for details).
Entities have also been added for the remaining characters occurring in CP-1252 which do not occur in the HTMLlat1 or HTMLsymbol entity sets. These all occur in the 128 to 159 range within the CP-1252 charset. These entities permit the characters to be denoted in a platform-independent manner.
To support these entities, user agents may support full [ISO10646] or use other means. Display of glyphs for these characters may be obtained by being able to display the relevant [ISO10646] characters or by other means, such as internally mapping the listed entities, numeric character references, and characters to the appropriate position in some font that contains the requisite glyphs.