A No-Compromises Architecture for Digital Document Preservation (original) (raw)
Related papers
Long term digital document survival using open source applications and operating systems
2004
The history of early materials is one of loss and eventual partial reconstruction through fragments. Today's digital document is not immune from loss, but whereas early material was capable of being partially recovered, a bit stream of zeros and ones cannot be intelhgibly recovered from fragments! The research seeks to investigate a contingency of last resort; by providing current viewing tools adaptable to future systems for fiiture users to display the stored digital documents, when other archiving methods fail. This approach is designed to be non-restrictive, allowing the use of any authoring tool available at the time to create digital documents. The research describes a viable method for archival storage, and a viable approach to dealing with issues of viewing and authoring software obsolescence. The capacity of Open Source apphcations to transcend the divide between operating platforms is demonstrated by this research. It argues the viability of Open Source applications in...
A System for Long-Term Document Preservation
Archiving Conference
This paper analyzes the requirements and describes a system designed for retaining records and ensuring their legibility, interpretability, availability, and provable authenticity over long periods of time. In general, information preservation is accomplished not by any one single technique, but by avoiding all of the many possible events that might cause loss. The focus of the system is on preservation in the 10 to 100 year time span-a long enough period such that many difficult problems are known and can be addressed, but not unimaginable in terms of the longevity of computer systems and technology. The general approach focuses on eliminating single points of failure-single elements whose failure would cause information loss-combined with active detection and repair in the event of failure. Techniques employed include secret sharing, aggressive "preemptive" format conversion, metadata acquisition, active monitoring, and using standard Internet storage services in a novel way.
Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries - JCDL '02, 2002
Digital information in any form is at risk. Software and hardware become obsolete, and versions and file formats change, making data inaccessible. Data stored in even the simplest form are in danger due to computer media degradation and obsolescence. Online information such as e-journals and databases are susceptible. They may become partially or entirely unreadable, and may not be recoverable by the time the problem is detected. Preservation strategies such as emulation (keeping alive the software and hardware needed to access a digital object), migration (converting the digital object to new versions and formats), and other longterm archival methods have been proposed . Models such as the Open Archival Information System (OAIS) provide an architecture for conducting digital preservation research and experimentation . The importance of preservation metadata has been recognized by a number of groups and efforts to develop and deploy metadata standards are underway .
Critique of Architectures for Long-Term Digital Preservation
2009
Trusted Digital Repositories (TDRs) and Trustworthy Digital Objects (TDOs) seem to be the only generic digital preservation methodologies proposed. Before any preservation method is recommended for wide use, it should be exposed to searching analysis. Evolving technology and fading human memory threaten the long-term intelligibility of many kinds of documents. Furthermore, some records are susceptible to improper alterations that make them untrustworthy. We argue that the TDR approach has shortfalls as a method for long-term digital preservation of sensitive information. For specificity, we discuss a particular implementation. TDO methodology addresses these needs, providing for making digital documents durably intelligible. It uses EDP standards for a few file formats and XML structures for text documents. For other information formats, intelligibility is assured by using a virtual computer. To protect sensitive information-content whose inappropriate alteration might mislead its readers, the integrity and authenticity of each TDO is made testable by embedded public-key cryptographic message digests and signatures. The authenticity of the keys is protected recursively in a social hierarchy grounded by publishing keys of well-known institutions. A TDO is a specific kind of OAIS Archival Information Package convenient for sharing among repositories. The content and metadata of properly constructed TDOs are sufficient for creating the usual kinds of catalog records and search indices during repository ingestion. Comparison of TDR and TDO methodologies suggests differentiating near-term preservation measures from what is needed for the long term. The proper focus for long-term preservation technology is signed packages that each combine a record collection with its metadata and that also bind context-Trustworthy Digital Objects. If all that stuff was worth creating, surely some of it is worth saving! © 2009, H.M. Gladney G:\W\DL\DigPres\Crit\TDR&TDO.doc more expensive to correct than they are today. This examination should seek opportunities to reduce complexity that might mislead readers. Technology for near-term preservation needs flexibility for software improvements. In contrast, technology for long-term preservation needs to be insensitive to changing technology and infrastructure. It therefore proves helpful to distinguish near-term preservation from long-term preservation. What Is the Challenge? What is the meaning of preservation? Does the meaning change when it is applied to electronic rather than paper-based records? ... Will current strategies for preserving electronic records ensure longevity and authenticity? ... Have effective cost models been developed? 2 The notion of a digital preservation theory 3,4 is recent, being mentioned earlier than 2007 only in comments about shortfalls. What do people expect of a theory to think it useful? To be most helpful for engineering, a theory would exhibit at least the following characteristics. • It would be based on broad fundamental theory that is widely accepted as germane and successful. • It would differentiate its topic from nearby topics, particularly topics that already have good theories.
Towards SIRF: Self-contained Information Retention Format
2013
Many organizations are now required to preserve and maintain access to large volumes of digital content for dozens of years. There is a need for preservation systems and processes to support such long-term retention requirements and enable the usability of those digital objects in the distant future, regardless of changes in technologies and designated communities. A key component in such preservation systems is the storage subsystem where the digital objects are located for most of their lifecycle. We describe SIRF (Self-contained Information Retention Format)-a logical storage container format specialized for long term retention. SIRF includes a set of digital preservation objects and a catalog with metadata related to the entire contents of the container as well as to the individual objects and their interrelationship. SIRF is being developed by the Storage Networking Industry Association (SNIA) 1 with the intention of creating a standardized vendor-neutral storage format that will be interpretable by future preservation systems and that will simplify and reduce the costs of digital preservation.
Digital Preservation of Electronic Resources
Desidoc Journal of Library Information Technology, 2012
Due to huge advances in information communication technologies (ICTs), there has been an astronomical growth of e-resources-e-journals, e-books, online databases and so on; libraries spend phenomenally on acquisition of these e-resources as these are very popularly used by the students and researchers. Unfortunately, this growth is accompanied by many threats. Digital content (of the e-resources) is fragile and not durable. Its accessibility and use by future generations depends on technology which very rapidly evolves and changes. Hence, ensuring access of e-resources for future generation of users is a big challenge for libraries. The present paper highlights various problems of digital content and elaborates how digital preservation is more demanding and challenging than preserving print copies of journals. It also gives a bird's eye view of various projects initiated for archiving digital content of scholarly journals.
Using the web infrastructure to preserve web pages
2007
Abstract To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the “living web” and placing them in an archive for controlled curation. Once inside an archive, the resources are subject to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications). For small numbers of resources of known value, this is a practical and worthwhile approach to digital preservation.
A foundation for automatic digital preservation
Ariadne, 2006
Efforts to archive a large amount of digital material are being developed by many cultural heritage institutions. We have evidence of this in the numerous initiatives aiming to harvest the Web [1-5] together with the impressive burgeoning of institutional repositories [6]. However, getting the material inside the archive is just the beginning for any initiative concerned with the long-term preservation of digital materials.
Document storage and retrieval
ACM SIGIR Forum, 1979
During the past decade, the volume of paper documentation has expanded almost beyond the ability to cope. Modern technology has been the cause of this expansion. High speed postal service, telecommunications, and the massive increase in development projects have placed a paper burden upon us which is almost too much to bear.