PDF, Version 1.7 (ISO 32000-1:2008) (original) (raw)

General

Identification of chronological versions of PDF can be given in two places in a PDF file. All PDF files should have a version identified in the header with the 5 characters %PDF– followed by a version number. For PDF files conforming to ISO 32000-1:2008 or earlier specifications (i.e. prior to ISO 32000-2:2017), the version number has the form 1.N, where N is a digit between 0 and 7. For example, PDF 1.7 is identified by %PDF–1.7. However, beginning with PDF 1.4, a conforming PDF writer may use the Version entry in the document Catalog to override the version specified in the header. The location of the Catalog within the file is indicated in the Root entry of the file trailer/footer. This override feature was introduced to facilitate the incremental updating of a PDF by simply adding to the end of the file. As a result, it is necessary to locate the Catalog within the file to get the correct version number. Unless the PDF is "linearized," in which case the Catalog is up front, this will require reading the trailer and then using the reference there to locate the Catalog, which will typically be compressed. This has practical implications because format identification tools, including DROID, typically look for particular characters at the beginning of a file (i.e., in the header), to permit identification with minimal effort. DROID can look for characters at the end of the file, but is not able to follow an indirect reference or decompress file contents. When the version number is not the same in the header and the Catalog, there is potential for format identification errors.

The JHOVE PDF module does take account of the situation, stating that for PDF 1.0 - 1.6, "The PDF version is determined by the data specified in the PDF header and the Version key of the document catalog dictionary. In the event that these two values do not match, the Version key is taken as the authoritative value."

Extension mechanism for PDF format: PDF 1.7 introduced an extension mechanism based on an Extensions Dictionary. Adobe used this mechanism to specify features introduced with Acrobat 9.0 (June 2008) and 9.1 (June 2009). See PDF_1_7_ext03 and PDF_1_7_ext05. Vendors developing extensions were expected to choose 4-character identifiers and be listed in a registry. Adobe uses the identifier ADBE. As of early 2019, http://adobe.com/go/ISO32000Registry (the proposed registry location) did not lead to a registry, but to a PDF with a form for submitting applications; as of October 2020, the URL was a broken link. Meanwhile, a PDF Name List is available as a spreadsheet on github at https://github.com/adobe/pdf-names-list. The plan for a registry was one of a small set of functional differences between Adobe's original specification for PDF 1.7 and the final ISO 32000-1:2008. For more detail about the mechanism for extending the PDF standard, see 7.12.2 Developer Extensions Dictionary and Annex E in ISO 32000-1:2018.

Recommended practice to facilitate recognition of a PDF document as a binary file: Both Adobe's PDF Reference for version 1.7 and ISO 32000-1:2008 recommend that "If a PDF file contains binary data, as most do ..., it is recommended that the header line be immediately followed by a comment line containing at least four binary characters—that is, characters whose codes are 128 or greater. This ensures proper behavior of file transfer applications that inspect data near the beginning of a file to determine whether to treat the file’s contents as text or as binary." This practice is required in PDF documents conforming to any version of PDF/A.

History

PDF 1.7 was released in November 2006 in association with version 8 of Acrobat and Adobe Reader. In January 2007, Adobe announced the intention to pursue standardization through TC 171/SC 2 of ISO. This process led to publication as ISO 32000-1 in July 2008. There are substantial editorial differences between the two specification documents, particularly in the order of material. Small functional differences may reflect asynchrony between the Adobe product development cycle and the ISO standardization process, but Adobe describes the specifications as "matching."

To quote from ISO 32000-1:2008, "The first version of PDF was designated PDF 1.0 and was specified by Adobe Systems Incorporated in the PDF Reference 1.0 document published by Adobe and Addison Wesley. Since then, PDF has gone through seven revisions designated as: PDF 1.1, PDF 1.2, PDF 1.3, PDF 1.4, PDF 1.5, PDF 1.6 and PDF 1.7. All non-deprecated features defined in a previous PDF version were also included in the subsequent PDF version. Since ISO 32000-1 is a PDF version matching PDF 1.7, it is also suitable for interpretation of files made to conform with any of the PDF specifications 1.0 through 1.7. Throughout this specification in order to indicate at which point in the sequence of versions a feature was introduced, a notation with a PDF version number in parenthesis (e.g., (PDF 1.3)) is used. Thus if a feature is labelled with (PDF 1.3) it means that PDF 1.0, PDF 1.1 and PDF 1.2 were not specified to support this feature whereas all versions of PDF 1.3 and greater were defined to support it."