7.2.1. Multipart: The common syntax (original) (raw)

Connected: An Internet Encyclopedia
7.2.1. Multipart: The common syntax

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 1521
Up: 7. The Predefined Content-Type Values
Up: 7.2. The Multipart Content-Type

Prev: 7.2. The Multipart Content-Type
Next: 7.2.2. The Multipart/mixed (primary) subtype

7.2.1. Multipart: The common syntax

All subtypes of "multipart" share a common syntax, defined in this section. A simple example of a multipart message also appears in this section. An example of a more complex multipart message is given in Appendix C.

The Content-Type field for multipart entities requires one parameter, "boundary", which is used to specify the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.

  NOTE: The hyphens are for rough compatibility with the earlier RFC
  934 method of message encapsulation, and for ease of searching for
  the boundaries in some implementations. However, it should be
  noted that multipart messages are NOT completely compatible with
  RFC 934 encapsulations; in particular, they do not obey RFC 934
  quoting conventions for embedded lines that begin with hyphens.
  This mechanism was chosen over the RFC 934 mechanism because the
  latter causes lines to grow with each level of quoting.  The
  combination of this growth with the fact that SMTP implementations
  sometimes wrap long lines made the RFC 934 mechanism unsuitable
  for use in the event that deeply-nested multipart structuring is
  ever desired.

WARNING TO IMPLEMENTORS: The grammar for parameters on the Content- type field is such that it is often necessary to enclose the boundaries in quotes on the Content-type line. This is not always necessary, but never hurts. Implementors should be sure to study the grammar carefully in order to avoid producing illegal Content-type fields. Thus, a typical multipart Content-Type header field might look like this:

             Content-Type: multipart/mixed;
                  boundary=gc0p4Jq0M2Yt08jU534c0p

But the following is illegal:

             Content-Type: multipart/mixed;
                  boundary=gc0p4Jq0M:2Yt08jU534c0p

(because of the colon) and must instead be represented as

             Content-Type: multipart/mixed;
                  boundary="gc0p4Jq0M:2Yt08jU534c0p"

This indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line

             --gc0p4Jq0M:2Yt08jU534c0p

Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that the initial CRLF is considered to be attached to the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).

  NOTE: The CRLF preceding the encapsulation line is conceptually
  attached to the boundary so that it is possible to have a part
  that does not end with a CRLF (line break). Body parts that must
  be considered to end with line breaks, therefore, must have two
  CRLFs preceding the encapsulation line, the first of which is part
  of the preceding body part, and the second of which is part of the
  encapsulation boundary.

Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.

The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line:

             --gc0p4Jq0M2Yt08jU534c0p--

There appears to be room for additional information prior to the first encapsulation boundary and following the final boundary. These areas should generally be left blank, and implementations must ignore anything that appears before the first boundary or after the last one.

  NOTE: These "preamble" and "epilogue" areas are generally not used
  because of the lack of proper typing of these parts and the lack
  of clear semantics for handling these areas at gateways,
  particularly X.400 gateways.  However, rather than leaving the
  preamble area blank, many MIME implementations have found this to
  be a convenient place to insert an explanatory note for recipients
  who read the message with pre-MIME software, since such notes will
  be ignored by MIME-compliant software.

  NOTE: Because encapsulation boundaries must not appear in the body
  parts being encapsulated, a user agent must exercise care to
  choose a unique boundary.  The boundary in the example above could
  have been the result of an algorithm designed to produce
  boundaries with a very low probability of already existing in the
  data to be encapsulated without having to prescan the data.
  Alternate algorithms might result in more 'readable' boundaries
  for a recipient with an old user agent, but would require more
  attention to the possibility that the boundary might appear in the
  encapsulated part.  The simplest boundary possible is something
  like "---", with a closing boundary of "-----".

As a very simple example, the following multipart message has two parts, both of them plain text, one of them explicitly typed and one of them implicitly typed:

  From: Nathaniel Borenstein <nsb@bellcore.com>
  To:  Ned Freed <ned@innosoft.com>
  Subject: Sample message
  MIME-Version: 1.0
  Content-type: multipart/mixed; boundary="simple
  boundary"

  This is the preamble.  It is to be ignored, though it
  is a handy place for mail composers to include an
  explanatory note to non-MIME conformant readers.
  --simple boundary

  This is implicitly typed plain ASCII text.
  It does NOT end with a linebreak.
  --simple boundary
  Content-type: text/plain; charset=us-ascii

  This is explicitly typed plain ASCII text.
  It DOES end with a linebreak.

  --simple boundary--
  This is the epilogue.  It is also to be ignored.

The use of a Content-Type of multipart in a body part within another multipart entity is explicitly allowed. In such cases, for obvious reasons, care must be taken to ensure that each nested multipart entity must use a different boundary delimiter. See Appendix C for an example of nested multipart entities.

The use of the multipart Content-Type with only a single body part may be useful in certain contexts, and is explicitly permitted.

The only mandatory parameter for the multipart Content-Type is the boundary parameter, which consists of 1 to 70 characters from a set of characters known to be very robust through email gateways, and NOT ending with white space. (If a boundary appears to end with white space, the white space must be presumed to have been added by a gateway, and must be deleted.) It is formally specified by the following BNF:

boundary := 0*69 bcharsnospace

bchars := bcharsnospace / " "

bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" /"_" / "," / "-" / "." / "/" / ":" / "=" / "?"

Overall, the body of a multipart entity may be specified as follows:

multipart-body := preamble 1*encapsulation close-delimiter epilogue

encapsulation := delimiter body-part CRLF

delimiter := "--" boundary CRLF ; taken from Content-Type field. ; There must be no space ; between "--" and boundary.

close-delimiter := "--" boundary "--" CRLF ; Again, no space by "--",

preamble := discard-text ; to be ignored upon receipt.

epilogue := discard-text ; to be ignored upon receipt.

discard-text := *(*text CRLF)

body-part := <"message" as defined in RFC 822, with all header fields optional, and with the specified delimiter not occurring anywhere in the message body, either on a line by itself or as a substring anywhere. Note that the semantics of a part differ from the semantics of a message, as described in the text.>

  NOTE: In certain transport enclaves, RFC 822 restrictions such as
  the one that limits bodies to printable ASCII characters may not
  be in force.  (That is, the transport domains may resemble
  standard Internet mail transport as specified in RFC821 and
  assumed by RFC822, but without certain restrictions.)  The
  relaxation of these restrictions should be construed as locally
  extending the definition of bodies, for example to include octets
  outside of the ASCII range, as long as these extensions are
  supported by the transport and adequately documented in the
  Content-Transfer-Encoding header field. However, in no event are
  headers (either message headers or body-part headers) allowed to
  contain anything other than ASCII characters.

  NOTE: Conspicuously missing from the multipart type is a notion of
  structured, related body parts.  In general, it seems premature to
  try to standardize interpart structure yet.  It is recommended
  that those wishing to provide a more structured or integrated
  multipart messaging facility should define a subtype of multipart
  that is syntactically identical, but that always expects the
  inclusion of a distinguished part that can be used to specify the
  structure and integration of the other parts, probably referring
  to them by their Content-ID field.  If this approach is used,
  other implementations will not recognize the new subtype, but will
  treat it as the primary subtype (multipart/mixed) and will thus be
  able to show the user the parts that are recognized.

Next: 7.2.2. The Multipart/mixed (primary) subtype

Connected: An Internet Encyclopedia
7.2.1. Multipart: The common syntax