A System for Long-Term Document Preservation (original) (raw)

Critique of Architectures for Long-Term Digital Preservation

2009

Trusted Digital Repositories (TDRs) and Trustworthy Digital Objects (TDOs) seem to be the only generic digital preservation methodologies proposed. Before any preservation method is recommended for wide use, it should be exposed to searching analysis. Evolving technology and fading human memory threaten the long-term intelligibility of many kinds of documents. Furthermore, some records are susceptible to improper alterations that make them untrustworthy. We argue that the TDR approach has shortfalls as a method for long-term digital preservation of sensitive information. For specificity, we discuss a particular implementation. TDO methodology addresses these needs, providing for making digital documents durably intelligible. It uses EDP standards for a few file formats and XML structures for text documents. For other information formats, intelligibility is assured by using a virtual computer. To protect sensitive information-content whose inappropriate alteration might mislead its readers, the integrity and authenticity of each TDO is made testable by embedded public-key cryptographic message digests and signatures. The authenticity of the keys is protected recursively in a social hierarchy grounded by publishing keys of well-known institutions. A TDO is a specific kind of OAIS Archival Information Package convenient for sharing among repositories. The content and metadata of properly constructed TDOs are sufficient for creating the usual kinds of catalog records and search indices during repository ingestion. Comparison of TDR and TDO methodologies suggests differentiating near-term preservation measures from what is needed for the long term. The proper focus for long-term preservation technology is signed packages that each combine a record collection with its metadata and that also bind context-Trustworthy Digital Objects. If all that stuff was worth creating, surely some of it is worth saving! © 2009, H.M. Gladney G:\W\DL\DigPres\Crit\TDR&TDO.doc more expensive to correct than they are today. This examination should seek opportunities to reduce complexity that might mislead readers. Technology for near-term preservation needs flexibility for software improvements. In contrast, technology for long-term preservation needs to be insensitive to changing technology and infrastructure. It therefore proves helpful to distinguish near-term preservation from long-term preservation. What Is the Challenge? What is the meaning of preservation? Does the meaning change when it is applied to electronic rather than paper-based records? ... Will current strategies for preserving electronic records ensure longevity and authenticity? ... Have effective cost models been developed? 2 The notion of a digital preservation theory 3,4 is recent, being mentioned earlier than 2007 only in comments about shortfalls. What do people expect of a theory to think it useful? To be most helpful for engineering, a theory would exhibit at least the following characteristics. • It would be based on broad fundamental theory that is widely accepted as germane and successful. • It would differentiate its topic from nearby topics, particularly topics that already have good theories.

A No-Compromises Architecture for Digital Document Preservation

Lecture Notes in Computer Science, 2005

The Multivalent Document Model offers a practical, proven, nocompromises architecture for preserving digital documents of potentially any data format. We have implemented from scratch such complex and currently important formats as PDF and HTML, as well as older formats including scanned paper, UNIX manual pages, TeX DVI, and Apple II AppleWorks word processing. The architecture, stable since its definition in 1997, extends easily to additional document formats, defines a cross-format document tree data structure that fully captures semantics and layout, supports full expression of a format's often idiosyncratic concepts and behavior, enables sharing of functionality across formats thus reducing implementation effort, can introduce new functionality such as hyperlinks and annotation t o older formats that cannot express them, and provides a single interface (API) across all formats. Multivalent contrasts sharply with emulation and conversion, and advances Lorie's Universal Virtual Computer with high-level architecture and extensive implementation.

The long-term preservation of authentic electronic records

… OF THE INTERNATIONAL CONFERENCE ON VERY …, 2001

The International Research on Permanent Authentic Records in Electronic Systems, known as the InterPARES project, began in 1999 and is nearing the completion of its first phase. Its goal was to develop the theoretical and methodological knowledge essential to the permanent ...

Evolving Domains, Problems and Solutions for Long Term Digital Preservation

We present, compare and contrast new directions in long term digital preservation as covered by the four large European Community funded research projects that started in 2011. The new projects widen the domain of digital preservation from the traditional purview of memory institutions preserving documents to include scenarios such as health-care, data with direct commercial value, and webbased data. Some of these projects consider not only how to preserve the programs needed to interpret the data but also how to manage and preserve the related workflows. Considerations such as risk analysis and cost estimation are built into some of them, and more than one of these efforts is examining the use of cloud-based technologies. All projects look into programmatic solutions, while emphasizing different aspects such as data collection, scalability, reconfigurability, and full lifecycle management. These new directions will make digital preservation applicable to a wider domain of users and will give better tools to assist in the process.

Authenticity, Integrity and Proof of Existence for Long-Term Archiving: a Survey

Electronic archives are increasingly being used to store in-formation that needs to be available for a long time such as land register information and medical records. In order for the data in such archives to remain useful, their integrity and authenticity must be protected over their entire life span. Also, in many cases it must be possible to prove that the data existed at a certain point in time. In this paper we sur-vey solutions that provide long-term integrity, authenticity, and proof of existence of archived data. We analyze which trust assumptions they require and compare their efficiency. Based on our analysis, we discuss open problems and promising research directions.

Towards a Theory of Digital Preservation

International Journal of Digital Curation, 2008

A preservation environment manages communication from the past while communicating with the future. Information generated in the past is sent into the future by the current preservation environment. The proof that the preservation environment preserves authenticity and integrity while performing the communication constitutes a theory of digital preservation. We examine the representation information that is needed about the preservation environment for a theory of digital preservation. The representation information includes descriptions of the preservation management policies, the preservation processes, and the state information that is needed to verify the correct working behavior of the system. We demonstrate rule-based data grids that can verify that prior policies correctly enforced preservation properties, while sending into the future descriptions of the current preservation management policies.

Long term digital document survival using open source applications and operating systems

2004

The history of early materials is one of loss and eventual partial reconstruction through fragments. Today's digital document is not immune from loss, but whereas early material was capable of being partially recovered, a bit stream of zeros and ones cannot be intelhgibly recovered from fragments! The research seeks to investigate a contingency of last resort; by providing current viewing tools adaptable to future systems for fiiture users to display the stored digital documents, when other archiving methods fail. This approach is designed to be non-restrictive, allowing the use of any authoring tool available at the time to create digital documents. The research describes a viable method for archival storage, and a viable approach to dealing with issues of viewing and authoring software obsolescence. The capacity of Open Source apphcations to transcend the divide between operating platforms is demonstrated by this research. It argues the viability of Open Source applications in...

Long-term archiving of digital data

2010

E-government applications have to archive data or documents for long retention periods of 100 years or more. This requires to store digital data on stable media, and to ensure that the file formats can be read by available software. Both applications as well as media technology have only short life spans. Thus, data has to be migrated at frequent intervals onto new data carriers and to new file formats. However, original file versions usually need to be retained permanently. In terms of cost, stability and technology independence, microfilm storage offers a promising solution for off-line storage. This paper reports on a feasibility study analysing encoding techniques that allow digital data to be saved onto microfilm, testing data recovery as well as cost issues.