Proposed checkpoint for character encoding support in APIs from Ian Jacobs on 2001-01-15 (w3c-wai-ua@w3.org from January to March 2001) (original) (raw)

Hello,

Per my action item from the 16 November 2000 face-to-face at AOL [1], please consider the following new Priority 1 checkpoint to address issue 327 [2]:

For an API implemented to satisfy requirements of this document, support the character encodings required for that API.

Note: Support for character encodings is important so that text is not "broken" when communicated to assistive technologies. The DOM Level 2 Core Specification [DOM2CORE], section 1.1.5 requires that the DOMString type be encoded using UTF-16.


For the Techniques document:

  1. For Java v 1.3, there is a list of encodings [JAVA13] that any conforming implementation must support is: US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16.

  2. MSAA relies on COM, which relies on Unicode, so support UTF-16 in practice. From COM documentation:

"Finally, and quite significantly, all strings passed through all COM interfaces (and, at least on Microsoft platforms, all COM APIs) are Unicode strings. There simply is no other reasonable way to get interoperable objects in the face of (i) location transparency, and (ii) a high-efficiency object architecture that doesn't in all cases intervene system-provided code between client and server. Further, this burden is in practice not large."

[From Chapter 3 on Interfaces: "Interface Binary Standard" http://msdn.microsoft.com/library/toc.asp?PaneName=Contents&tocPath=specs4-3-0&ShowPane=true#sel


For the references:

DOM2CORE, section 1.1.5 http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578 List of registered charset ids: ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets Character Model for the World Wide Web http://www.w3.org/TR/charmod Unicode glossary http://www.unicode.org/glossary/ [JAVA13] Java 1.3 documentation

http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc


For the glossary:

A "character encoding" is a mapping from a character set definition to the actual code units used to represent the data. Please refer to the Unicode 3.0 standard [UNICODE] for more information.

[1] http://www.w3.org/WAI/UA/2000/11/minutes-20001116 [2] http://server.rehab.uiuc.edu/ua-issues/issues-linear-lc2.html#327

Ian Jacobs (jacobs@w3.org) http://www.w3.org/People/Jacobs Tel: +1 831 457-2842 Cell: +1 917 450-8783

Received on Monday, 15 January 2001 12:48:06 UTC