Link rot (original) (raw)

For broken links in Wikipedia, see Wikipedia:Linkrot, Wikipedia:Using the Wayback Machine, and Special:BrokenRedirects.

Link rot (or linkrot), also known as link death or link breaking is an informal term for the process by which, either on individual websites or the Internet in general, increasing numbers of links point to web pages, servers or other resources that have become permanently unavailable. The phrase also describes the effects of failing to update out-of-date web pages that clutter search engine results. A link that does not work any more is called a broken link, dead link or dangling link.

Contents

Causes

A link may become broken for several reasons: The most common result of a dead link is a 404 error, which indicates that the web server responded, but the specific page could not be found.

Some news sites contribute to the link rot problem by keeping only recent news articles online where they are freely accessible at their original URLs, then removing them or moving them to a paid subscription area. This causes a heavy loss of supporting links in sites discussing newsworthy events and using news sites as references.[_citation needed_]

Another type of dead link occurs when the server that hosts the target page stops working or relocates to a new domain name. In this case the browser may return a DNS error, or it may display a site unrelated to the content sought. The latter can occur when a domain name is allowed to lapse, and is subsequently reregistered by another party. Domain names acquired in this manner are attractive to those who wish to take advantage of the stream of unsuspecting surfers that will inflate hit counters and PageRanking.

A link might also be broken because of some form of blocking such as content filters or firewalls. Dead links commonplace on the Internet can also occur on the authoring side, when website content is assembled, copied, or deployed without properly verifying the targets, or simply not kept up to date.

Prevalence

The 404 "Not Found" response is familiar to even the occasional Web user. A number of studies have examined the prevalence of link rot on the Web, in academic literature, and in digital libraries. In a 2003 experiment, Fetterly et al. discovered that about one link out of every 200 disappeared each week from the internet. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year.

Discovering

Detecting link rot for a given URL is difficult using automated methods. If a URL is accessed and returns an HTTP 200 (OK) response, it may be considered accessible, but the contents of the page may have changed and may no longer be relevant. Some web servers also return a soft 404, a page returned with a 200 (OK) response (instead of a 404 that indicates the URL is no longer accessible). Bar-Yossef et al. (2004) developed a heuristic for automatically discovering soft 404s.[_citation needed_]

Combating

Due to the unprofessional image that dead links bring to both sites linking and linked to, there are multiple solutions that are available to tackle them — some working to prevent them in the first place, and others trying to resolve them when they have occurred. There are several tools that have been developed to help combat link rot.

Server side

User side

Web archiving

To combat link rot, web archivists are actively engaged in collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. The largest web archiving organization is the Internet Archive, which strives to maintain an archive of the entire Web, taking periodic snapshots of pages that can then be accessed for free via the Wayback Machine and without registration many years later simply by typing in the URL, or automatically by using browser extensions.[6] National libraries, national archives and various consortia of organizations are also involved in archiving culturally important Web content.

Individuals may also use a number of tools that allow them to archive web resources that may go missing in the future:

A number of studies have shown how widespread link rot is in academic literature (see below). Authors of scholarly publications have also developed best practices for combating link rot in their work:

See also

Further reading

In academic literature

In digital libraries

References

  1. ^ Rønn-Jensen, Jesper (2007-10-05). "Software Eliminates User Errors And Linkrot". Justaddwater.dk. http://justaddwater.dk/2007/10/05/blog-software-eliminates-user-errors-and-linkrot/. Retrieved 2007-10-05.
  2. ^ Mueller, John (2007-12-14). "FYI on Google Toolbar's Latest Features". Google Webmaster Central Blog. http://googlewebmastercentral.blogspot.com/2007/12/fyi-on-google-toolbars-latest-features.html. Retrieved 2008-07-09.
  3. ^ deadurl.com
  4. ^ "DeadURL.com". http://deadurl.com/. Retrieved 2011-03-17. "DeadURL.com gathers as many backup links as possible for each dead url, via Google cache, Archive.org, and user submissions."
  5. ^ "DeadURL.com". http://deadurl.com/. Retrieved 2011-03-17. "Just type deadurl.com/ in front of a link that doesn't work, and hit Enter."
  6. ^ 404-Error ? :: Add-ons for Firefox
  7. ^ archive.org