A survey on web archiving initiatives (original) (raw)

The evolution of web archiving

International Journal on Digital Libraries, 2016

Web archives preserve information published on the web or digitized from printed publications. Much of this information is unique and historically valuable. However, the lack of knowledge about the global status of web archiving initiatives hamper their improvement and collaboration. To overcome this problem, we conducted two surveys, in 2010 and 2014, which provide a comprehensive characterization on web archiving initiatives and their evolution. We identified several patterns and trends that highlight challenges and opportunities. We discuss these patterns and trends that enable to define strategies, estimate resources and provide guidelines for research and development of better technology. Our results show that during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved. While this indicates that the web archiving community is dedicating a growing effort on preserving digital information, other results presented throughout the paper raise concerns such as the small amount of archived data in comparison with the amount of data that is being published online.

Web Archiving Methods and Approaches: A Comparative Study

Library Trends, 2005

The Web is a virtually infi nite information space, and archiving its entirety, all its aspects, is a utopia. The volume of information presents a challenge, but it is neither the only nor the most limiting factor given the continuous drop in storage device costs. Signifi cant challenges lie in the management and technical issues of the location and collection of Web sites. As a consequence of this, archiving the Web is a task that no single institution can carry out alone. This article will present various approaches undertaken today by different institutions; it will discuss their focuses, strengths, and limits, as well as a model for appraisal and identifying potential complementary aspects amongst them. A comparison for discovery accuracy is presented between the snapshot approach done by the Internet Archive (IA) and the eventbased collection done by the Bibliothèque Nationale de France (BNF) in 2002 for the presidential and parliamentary elections. The balanced conclusion of this comparison allows for identifi cation of future direction for improvement of the former approach.

Web Archiving in the United States - A 2017 Survey

2014

From October 2 to November 20, 2017, a working group of individuals representing multiple NDSA member institutions and interest groups conducted a survey of organizations in the United States actively involved in, or planning to start, programs to archive content from the Web. This effort builds upon and extends a broader effort begun in three earlier surveys, which the NDSA Web Archiving Survey working group has conducted since 2011.The goal of these surveys is to better understand the landscape of Web archiving activities in the United States by investigating the organizations involved; the history and scope of their Web archiving programs; the types of Web content being preserved; the tools and services being used; access and discovery services being offered; and overall policies related to Web archiving programs. The responses from this survey document the current state of U.S. Web archiving initiatives and the comparison with the results of the 2011, 2013, and 2016 sur...

Web Archives: The Future (s)

2011

EXECUTIVE SUMMARY This report has been written by researchers at the Oxford Internet Institute for the International Internet Preservation Consortium (IIPC). The aim is to stimulate further discussion among web archivists and researchers about the future ways in which web archives can be used by researchers.

Web Archiving in the UK: Current Developments and Reflections for the Future

2017

This work presents a brief overview on the history of Web archiving projects in some English speaking countries, paying particular attention to the development and main problems faced by the UK Web Archive Consortium (UKWAC) and UK Web Archive partnership in Britain. It highlights, particularly, the changeable nature of Web pages through constant content removal and/or alteration and the evolving technological innovations brought recently by Web 2.0 applications, discussing how these factors have an impact on Web archiving projects. It also examines different collecting approaches, harvesting software limitations and how the current copyright and deposit regulations in the UK covering digital contents are failing to support Web archive projects in the country. From the perspective of users' access, this dissertation offers an analysis of UK Web archive interfaces identifying their main drawbacks and suggesting how these could be further improved in order to better respond to users' information needs and access to archived Web content.

Web archiving in a Web 2.0 world

The Electronic Library, 2009

The National Library of Australia is the lead institution for digital archiving and preservation in Australia. Its PANDORA Archive has been the repository for archived web resources in Australia for over ten years and is a mature but continually developing system. The archival management system PANDAS that underpins the Archive, is as of 2007, in its third major revision. Other web archiving activities also now include annual Australian Domain Harvests and the usage of Archive-It, both of which are conducted in conjunction with the Internet Archive. This paper discusses the current state of web archiving in Australia, and how libraries are adapting their services in recognition of the expanding role that online material plays in their collections. For many years it was considered that archiving could only ever completely capture a small, albeit representative, sample of the Internet. Today the gap between what is available and what can be archived is decreasing. But as our archives and our archiving abilities increase, we are still confronted by new technologies and web 2.0 applications. Using as an example the 2007 Federal Election in which a large number of interactive sites such as Kevin07, MySpace and YouTube were archived the paper will show how Australian web archivers continue to adapt to and meet new challenges.