Lightweight Packaging Format (LPF) (original) (raw)
Abstract
This section is non-normative.
This specification defines a file format and processing model for packaging into a single-file container the set of related resources and associated metadata that comprise a digital publication.
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Publishing Working Group as a Working Group Note.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them topublic-publ-wg@w3.org (archives).
Publication as a Working Group Note does not imply endorsement by theW3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under theW3C Patent Policy.
The group does not expect this document to become a W3C Recommendation.
This document is governed by the1 March 2019 W3C Process Document.
Table of Contents
- 1. Introduction
- 2. Terminology
- 3. Conformance
- 4. Packaging format
- 5. Compression of resources
- 6. File and Directory Structure
- 7. Obtaining a Publication Manifest
- A. The application/lpf+zip Media Type
- B. Acknowledgements
- C. References
- C.1 Normative references
- C.2 Informative references
1. Introduction
This section is non-normative.
A digital publication Package is used:
- To exchange in-progress packaged publications between different individuals and/or different organizations;
- To provide finalized packaged publications from a publisher or conversion house to different distribution or sales channels; and
- To deliver packaged publications to users or user agents.
This specification is based on proven technologies and allows digital publications to be packaged in an easy way, hence the term "lightweight" used in its name.
2. Terminology
This section is non-normative.
This document uses terminology defined by the W3C Note "Publishing and Linking on the Web" [publishing-linking], including, in particular, user and user agent.
In addition, the following terminology is defined for use in this specification:
Codec content types
Content types that have intrinsic binary format qualities, such as video and audio media types which are already designed for optimum compression, or which provide optimized streaming capabilities.
Non-Codec content types
Content types that benefit from compression due to the nature of their internal data structure, such as file formats based on character strings (for example, HTML, CSS, etc.).
Package
Single-file container for the set of constituent resources and associated metadata that comprise a digital publication.
Primary Entry Page
Preferred starting resource for a digital publication, enabling in some cases the discovery of its Publication Manifest.
Digital Publication
Set of constituent resources and associated metadata, organized together in a uniquely identifiable grouping.
Publication Manifest
[JSON-LD] representation of a digital publication as defined in [pub-manifest].
Root Directory
Base directory of the Package file system.
Only the first instance of a term in a section is linked to its definition.
3. Conformance
This section is non-normative.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, and SHOULD in this document are to be interpreted as described inBCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
4. Packaging format
This section is non-normative.
For packaging the set of constituent resources and associated metadata that comprise a digital publication, this specification uses the ZIP format as specified in ISO/IEC 21320-1:2015 ([ISO21320] and [zip]).
5. Compression of resources
This section is non-normative.
When stored in a Package, resources with Non-Codec content types SHOULD be compressed and the Deflate compression algorithm MUST be used. This practice ensures that file entries stored in the Package have a smaller size.
Resources with Codec content types SHOULD be stored without compression. In such case, compression would introduce unnecessary processing overhead at production time (especially with large resource files) and would impact audio/video playback performance at consumption time.
Note
In some cases, the combination of compression with some encryption schemes might even hinder the ability of user agents to handle partial content requests (e.g. HTTP byte ranges), due to the technical difficulty to determine the length of the full resource ahead of media playback (e.g. HTTP Content-Length header).
6. File and Directory Structure
This section is non-normative.
A Package MUST include at least one of the following files in its Root Directory:
- A file named
publication.json
, which MUST be in the format defined for Publication Manifests. - A file named
index.html
which MUST follow the requirements of the Primary Entry Page of a digital publication.
The Root Directory is virtual in nature: a user agent might or might not generate a physical root directory for the contents of the Package if such contents are unpackaged.
The contents of both files MUST not be encrypted.
A Package MUST also include all resources within the bounds of the digital publication, i.e. the finite set of resources obtained from the union of resources listed in the default reading order and resource list of the Publication Manifest.
These resource files MAY be in any location descendant from the Root Directory, or in the Root Directory itself.
Contents within the Package MUST reference these resources via relative-URL strings [url].
Note
The [zip] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors must be careful to use characters which allow a broad interoperability among operating systems.
7. Obtaining a Publication Manifest
This section is non-normative.
If the Package contains a publication.json
file located in the Root Directory, the Publication Manifest is obtained by opening and parsing this file.
Otherwise, if the Package contains an index.html
file located in the Root Directory, the Publication Manifest is obtained through the following steps:
- Let
document
be the result of the extraction of theindex.html
file from the Package. - If it does not have the media type
text/html
orapplication/xhtml+xml
, terminate this algorithm. - Let manifest link be the first
link
element in tree order indocument
whoserel
attribute contains thepublication
token. - If manifest link is
null
, terminate this algorithm. - If manifest link's
href
attribute's value is the empty string, terminate this algorithm. - If the
href
attribute value of manifest link has a non-null fragment identifying an identifier id indocument
:- Let embedded manifest script be the first
script
element in tree order, whoseid
attribute is equal toid and whosetype
attribute is equal toapplication/ld+json
. - If embedded manifest script is
null
, terminate this algorithm. - Let manifest text be thechild text content of embedded manifest script
Explanation
This branch is in use when the manifest is embedded in the primary entry page. The algorithm locates thescript
element and extract the manifest itself.
- Let embedded manifest script be the first
- Otherwise:
- Let manifest URL be the value of the
href
attribute. - If manifest URL is not a relative URL string, then abort these steps.
- Extract the Manifest from the Package using manifest URL.
- Open and read the Manifest file, letting manifest text be the result.
Explanation
This branch is in use when the manifest is in a separate file. It performs the standard operations to retrieve the manifest from the Package.
- Let manifest URL be the value of the
If both index.html
and publication.json
are present in the Package, then the Primary Entry Page SHOULD contain a reference to the publication.json
file, following the rules defined in this section.
This section is non-normative.
This appendix registers the media type application/lpf+zip
for the Lightweight Packaging Format (LPF).
Lightweight Packaging Format (or LPF) is a container technology based on the [zip] archive format, used for packaging into a single-file container the set of related resources and associated metadata that comprise a digital publication . LPF and its related standards are maintained and defined by the World Wide Web Consortium (W3C).
MIME media type name:
application
MIME subtype name:
lpf+zip
Required parameters:
N/A
Optional parameters:
N/A
Encoding considerations:
LPF files are binary files in ZIP format.
Security considerations:
Security considerations that apply to application/zip
also apply to LPF files. For instance, an archive could contain compressed files that expand to fill all available disk space on a hard drive. In consequence, user agents that read LPF files should rigorously check the size and validity of data retrieved.
In addition, because of the various content types that can be embedded in LPF files, application/lpf+zip
may describe content that poses security issues, e.g. malicious executable content deliberately included in the package. However, only in cases where the user agent recognizes and processes the additional content, or where further processing of that content is dispatched to other user agents, would security issues potentially arise. In such cases, matters of security would fall outside the domain of this registration document.
Interoperability considerations:
Any format based on LPF, if using content encryption, MUST choose a different MIME media type and file extension than those defined in this specification.
Published specification:
This media type registration is for the Lightweight Packaging Format (LPF), as described by the Lightweight Packaging Format (LPF) specification located at https://www.w3.org/TR/lpf.
Applications that use this media type:
This media type is intended to be used by multiple interoperable applications for the distribution and consumption of ebooks, audiobooks, digital visual narratives and other types of digital publications.
Additional information:
Magic number(s):
0: PK 0x03 0x04
File extension(s):
LPF files are most often identified with the extension.lpf
.
Macintosh file type code(s):
ZIP
Fragment identifiers:
None
Person & email address to contact for further information:
Ivan Herman (ivan@w3.org)
Intended usage:
COMMON
Author/change controller:
The published specification is a work product of the World Wide Web Consortium (W3C)’s Publishing Working Group. The W3C has change control over this specification.
B. Acknowledgements
This section is non-normative.
The editor would like to thank the members of the Publishing Working Group for their contributions to this specification:
- Greg Albers (J. Paul Getty Trust)
- Franco Alvarado (Macmillan Learning)
- Boris Anthony (The Rebus Foundation)
- Luc Audrain (Hachette Livre)
- Baldur Bjarnason (The Rebus Foundation)
- Laura Brady (W3C Invited Expert)
- Steve Breault (Scenarex Inc.)
- Don Brutzman (Web3D Consortium)
- Kaylin Bugbee (Earth Science Data Systems Program)
- Yu-Wei Chang (Taiwan Digital Publishing Forum)
- Fred Chasen (W3C Invited Expert)
- Timothy Cole (University of Illinois at Urbana-Champaign)
- Simon Collinson (Rakuten, Inc.)
- Rachel Comerford (Macmillan Learning)
- Garth Conboy (Google, Inc., chair)
- Juan Corona (Evident Point Software)
- Christopher Cosner (Stanford University)
- Dave Cramer (Hachette Livre)
- Greg Davis (Pearson plc)
- Romain Deltour (DAISY Consortium)
- Marisa DeMeglio (DAISY Consortium)
- Vagner Diniz (NIC.br - Brazilian Network Information Center)
- Kenneth Dougherty (Pearson plc)
- Brady Duga (Google, Inc.)
- Ben Dugas (Rakuten, Inc.)
- Roger Espinosa (University of Michigan)
- Reinaldo Ferraz (NIC.br - Brazilian Network Information Center)
- Teenya Franklin (Pearson plc)
- Jun Gamo (Voyager Japan, Inc.)
- Matt Garrish (DAISY Consortium)
- Michael Goodman (Wiley)
- Markku Hakkinen (Educational Testing Service)
- Katie Haritos-Shea (Knowbility)
- Ivan Herman (W3C Staff)
- Geoff Jukes (Blackstone Audio, Inc.)
- Deborah Kaplan (W3C Invited Expert)
- Bill Kasdorf (Book Industry Study Group)
- George Kerscher (DAISY Consortium)
- Yuri Khramov (Evident Point Software)
- Masakazu Kitahara (Voyager Japan, Inc.)
- Toshiaki Koike (Voyager Japan, Inc.)
- Charles LaPierre (Benetech)
- Mustapha Lazrek (Microsoft Corporation)
- Vladimir Levantovsky (Monotype)
- Mia Lipner (Pearson plc)
- Phil Madans (Hachette Livre)
- Christopher Maden (University of Illinois at Urbana-Champaign)
- Dmitry Markushevich (Evident Point Software)
- keith mcfarland (Blackstone Audio, Inc.)
- Jonathan McGlone (University of Michigan)
- Hugh McGuire (The Rebus Foundation)
- Nellie McKesson (W3C Invited Expert)
- Selma Morais (NIC.br - Brazilian Network Information Center)
- Jasmine Mulliken (Stanford University)
- Cristina Mussinelli (Fondazione LIA)
- Christos Nikolakakos (Wiley)
- Gregorio Pellegrino (Fondazione LIA)
- Fernando Pinto da Silva (EDRLab)
- Nicholas Polys (Web3D Consortium)
- Chris Powell (University of Michigan)
- Jeff Printy (Macmillan Learning)
- Ryan Pugatch (Hachette Livre)
- Joshua Pyle (Wiley)
- Wendy Reid (Rakuten, Inc., chair)
- Florian Rivoal (W3C Invited Expert)
- Leonard Rosenthol (Adobe)
- Robert Sanderson (J. Paul Getty Trust)
- Jodi Schneider (University of Illinois at Urbana-Champaign)
- Ben Schroeter (Pearson plc)
- Tzviya Siegman (Wiley, chair)
- Avneesh Singh (DAISY Consortium)
- Adam Sisco (Earth Science Data Systems Program)
- David Stroup (Pearson plc)
- Mateus Teixeira (W. W. Norton & Company)
- Jonathan Thurston (Pearson plc)
- Yukio Tomikura (Kodansha, Publishers, Ltd.)
- Ben Walters (Microsoft Corporation)
- Daniel Weck (EDRLab, DAISY Consortium)
- John Weise (University of Michigan)
- Jason White (Educational Testing Service)
- Richard Wright (EDRLab)
- Jeff Xu (Rakuten, Inc.)
- Evan Yamanishi (W. W. Norton & Company)
- Maurice York (University of Michigan)
- Junichi Yoshii (Kodansha, Publishers, Ltd.)
- Benjamin Young (Wiley)
- Mohamed ZERGAOUI (INNOVIMAX)
C. References
C.1 Normative references
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://tools.ietf.org/html/rfc8174
C.2 Informative references
[audiobooks]
Audiobooks. Wendy Reid; Matt Garrish. W3C. 28 January 2020. W3C Candidate Recommendation. URL: https://www.w3.org/TR/audiobooks/
[dom]
DOM Standard. Anne van Kesteren. WHATWG. Living Standard. URL: https://dom.spec.whatwg.org/
[HTML]
HTML Standard. Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[ISO21320]
Document Container File - ISO/IEC 21320. ISO. 2015. International Standard. URL: https://www.iso.org/standard/60101.html
[JSON-LD]
JSON-LD 1.0. Manu Sporny; Gregg Kellogg; Markus Lanthaler. W3C. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/json-ld/
[pub-manifest]
Publication Manifest. Matt Garrish; Ivan Herman. W3C. 28 January 2020. W3C Candidate Recommendation. URL: https://www.w3.org/TR/pub-manifest/
[publishing-linking]
Publishing and Linking on the Web. Ashok Malhotra; Larry Masinter; Jeni Tennison; Daniel Appelquist. W3C. 30 April 2013. W3C Note. URL: https://www.w3.org/TR/publishing-linking/
[url]
URL Standard. Anne van Kesteren. WHATWG. Living Standard. URL: https://url.spec.whatwg.org/
[zip]
.ZIP File Format Specification. 1 September 2012. Final. URL: https://www.pkware.com/documents/casestudies/APPNOTE.TXT