Performance, Implementation, and Design Notes (original) (raw)

Contents

Notes on invalid documents
Special characters in URI attribute values
1. Non-ASCII characters in URI attribute values
2. Ampersands in URI attribute values
SGML implementation notes
Notes on helping search engines index your Web site
1. Search robots
  - The robots.txt file
  - Robots and the META element
Notes on tables
1. Design rationale
2. Recommended Layout Algorithms
  - Fixed Layout Algorithm
  - Autolayout Algorithm
Notes on forms
1. Incremental display
2. Future projects
Notes on scripting
1. Reserved syntax for future script macros
  - Current Practice for Script Macros
Notes on frames
Notes on accessibility
Notes on security
Security issues for forms

The following notes are informative, not normative. Despite the appearance of words such as "must" and "should", all requirements in this section appear elsewhere in the specification.

B.1 Notes on invalid documents

This specification does not define how conforming user agents handle generalerror conditions, including how user agents behave when they encounter elements, attributes, attribute values, or entities not specified in this document.

However, to facilitate experimentation and interoperability between implementations of various versions of HTML, we recommend the following behavior:

If a user agent encounters an element it does not recognize, it should try to render the element's content.
If a user agent encounters an attribute it does not recognize, it should ignore the entire attribute specification (i.e., the attribute and its value).
If a user agent encounters an attribute value it doesn't recognize, it should use the default attribute value.
If it encounters an undeclared entity, the entity should be treated as character data.

We also recommend that user agents provide support for notifying the user of such errors.

Since user agents may vary in how they handle error conditions, authors and users must not rely on specific error recovery behavior.

The HTML 2.0 specification ([RFC1866]) observes that many HTML 2.0 user agents assume that a document that does not begin with a document type declaration refers to the HTML 2.0 specification. As experience shows that this is a poor assumption, the current specification does not recommend this behavior.

For reasons of interoperability, authors must not "extend" HTML through the available SGML mechanisms (e.g., extending the DTD, adding a new set of entity definitions, etc.).

B.2 Special characters in URI attribute values

B.2.1 Non-ASCII characters in URI attribute values

Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with %URI; in the DTD). For instance, the following href value is illegal:

...

We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:

Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).

This procedure results in a syntactically legal URI (as defined in [RFC1738], section 2.2 or [RFC2141], section 2) that is independent of the character encoding to which the HTML document carrying the URI may have been transcoded.

Note. Some older user agents trivially process URIs in HTML using the bytes of the character encoding in which the document was received. Some older HTML documents rely on this practice and break when transcoded. User agents that want to handle these older documents should, on receiving a URI containing characters outside the legal set, first use the conversion based on UTF-8. Only if the resulting URI does not resolve should they try constructing a URI based on the bytes of the character encoding in which the document was received.

Note. The same conversion based on UTF-8 should be applied to values of the name attribute for the A element.

B.2.2 Ampersands in URI attribute values

The URI that is constructed when a form is submitted may be used as an anchor-style link (e.g., the href attribute for the Aelement). Unfortunately, the use of the "&" character to separate form fields interacts with its use in SGML attribute values to delimit character entity references. For example, to use the URI "http://host/?x=1&y=2" as a linking URI, it must be written or .

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.

B.3 SGML implementation notes

B.3.1 Line breaks

SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception.

The following two HTML examples must be rendered identically:

Thomas is watching TV.

So must the following two examples:

My favorite Website

B.3.2 Specifying non-HTML data

Script and style data may appear as element content or attribute values. The following sections describe the boundary between HTML markup and foreign data.

Note. The DTD defines script and style data to be CDATA for both element content and attribute values. SGML rules do not allow character references in CDATA element content but do allow them in CDATA attribute values. Authors should pay particular attention when cutting and pasting script and style data between element content and attribute values.

This asymmetry also means that when transcoding from a richer to a poorer character encoding, the transcoder cannot simply replace unconvertible characters in script or style data with the corresponding numeric character references; it must parse the HTML document and know about each script and style language's syntax in order to process the data correctly.

Element content

When script or style data is the content of an element (SCRIPT and STYLE), the data begins immediately after the element start tag and ends at the first ETAGO ("</") delimiter followed by a name start character ([a-zA-Z]); note that this may not be the element's end tag. Authors should therefore escape "</" within the content. Escape mechanisms are specific to each scripting or style sheet language.

ILLEGAL EXAMPLE:
The following script data incorrectly contains a "</" sequence (as part of "") before the SCRIPT end tag:

<SCRIPT type="text/javascript">
  document.write ("<EM>This won't work</EM>")
</SCRIPT>

In JavaScript, this code can be expressed legally by hiding the ETAGO delimiter before an SGML name start character:

<SCRIPT type="text/javascript">
  document.write ("<EM>This will work<\/EM>")
</SCRIPT>

In Tcl, one may accomplish this as follows:

<SCRIPT type="text/tcl">
  document write "<EM>This will work<\/EM>"
</SCRIPT>

In VBScript, the problem may be avoided with the Chr()function:

"<EM>This will work<" & Chr(47) & "EM>"

Attribute values

When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of "&" if the "&" is not meant to be the beginning of a character reference.

'"' should be written as """ or """
'&' should be written as "&" or "&"

Thus, for example, one could write:

B.3.3 SGML features with limited support

SGML systems conforming to [ISO8879] are expected to recognize a number of features that aren't widely supported by HTML user agents. We recommend that authors avoid using all of these features.

B.3.4 Boolean attributes

Authors should be aware that many user agents only recognize the minimized form of boolean attributes and not the full form.

For instance, authors may want to specify: