Automatic hypertext construction (original) (raw)

Automatic hypertext link typing

1996

ABSTRACT We present entirely automatic methods for gathering documents for a hypertext, linking the set, and annotating those connections with a description of the type (ie, nature) of the link. Document linking is based upon high-quality information retrieval techniques developed using the Smart system.

Design and Implementation of a Tool for the Automatic Construction of Hypertexts for Information Retrieval

Information Processing and Management, 1996

The paper describes the design and implementation of TACHIR, a tool for the automatic construction of hypertexts for Information Retrieval. Through the use of an authoring methodology employing a set of well known Information Retrieval techniques, TACHIR automatically builds up a hypertext from a document collection. The structure of the hypertext reflects a three level conceptual model that has proved to be quite effective for Information Retrieval. Using this model it is possible to navigate among documents, index terms, and concepts using automatically determined links. The hypertext is implemented using the HTML hypertext mark up language, the mark up language of the World Wide Web project. It can be distributed on different sites and different machines over the Internet, and it can be navigated using any of the interfaces developed in the framework World Wide Web project, for example NETScAPE.

A text mining approach for automatic construction of hypertexts

Expert Systems with Applications, 2005

The research on automatic hypertext construction emerges rapidly in the last decade because there exists a urgent need to translate the gigantic amount of legacy documents into web pages. Unlike traditional 'flat' texts, a hypertext contains a number of navigational hyperlinks that point to some related hypertexts or locations of the same hypertext. Traditionally, these hyperlinks were constructed by the creators of the web pages with or without the help of some authoring tools. However, the gigantic amount of documents produced each day prevent from such manual construction. Thus an automatic hypertext construction method is necessary for content providers to efficiently produce adequate information that can be used by web surfers. Although most of the web pages contain a number of non-textual data such as images, sounds, and video clips, text data still contribute the major part of information about the pages. Therefore, it is not surprising that most of automatic hypertext construction methods inherit from traditional information retrieval research. In this work, we will propose a new automatic hypertext construction method based on a text mining approach. Our method applies the self-organizing map algorithm to cluster some at text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents will form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of at text documents collected from a newswire site. Although we only use Chinese text documents, our approach can be applied to any documents that can be transformed to a set of index terms.

Automatic resource compilation by analyzing hyperlink structure and associated text

Computer Networks and …, 1998

We describe the design, prototyping and evaluation of ARC, a system for automatically compiling a list of authoritative web resources on any (sufficiently broad) topic. The goal of ARC is to compile resource lists similar to those provided by Yahoo! or Infoseek. The fundamental difference is that these services construct lists either manually or through a combination of human and automated effort, while ARC operates fully automatically. We describe the evaluation of ARC, Yahoo!, and Infoseek resource lists by a panel of human users. This evaluation suggests that the resources found by ARC frequently fare almost as well as, and sometimes better than, lists of resources that are manually compiled or classified into a topic. We also provide examples of ARC resource lists for the reader to examine.

Automatic generation of hypertext knowledge bases

ACM SIGOIS Bulletin, 1988

A model of knowledge-based text condensation is presented which has been implemented as part of the text analysis system TOPIC. The condensation process transforms the text representation structures resulting from the text p,use into a more abstract thematic description of what the text is about, filtering out irrelevant knowledge: structures and preserving only the most salient concepts. The topical structure of a text, finally, is represented in a hierarchical text graph which supports variable degrees of abstraction for text summarization as well as content-oriented retrieval of text knowledge. Due to their non-linear organization, text graphs share a lot of similarities with hypertexts. Their contribution to this field incorporates a methodology for the automatic generation of hypertexts from given full-text files, a close coupling of basic hypertext notions (links, nodes) to the formal specifications of a frame representation model, and conceptual navigation and filtering facilities which allow a user-defined level of information granularity when accessing hypertext knowledge bases. * Acknowledgment. The work reported in this paper was supported by Bundesministerium fiir Forschung und Technologie (BMFT) / Gesellschaft fiir Information und Dokumentation (GID) under contract 1020016 0.

A step forward to hypertext

2000

In this paper, after a critical review of how hypertext has been understood over the past few years, we claim against the distinction between total and partial hypertext, and we provide a brief description of a dynamic system that allows the automatic highlighting of those textual elements related to a certain topic. The outcome of our approach is ESQUITX, an

Context-sensitive hypertext generation

1997

This (position)paper claims that the role of Natural Language in a hyper-media information system is to provide, at each moment, a context-sensitive navigation point, i.e., a hypertext node in which the relevance of hyperlinks is justified with respect to the context of the interaction. It acts as the primary entry point for the user to the various pages that constitute an information service. We call context the collection of features that determine the desirable content and form of the information. We describe an experiment based on an existing information server showing how to capture contextual parameters and how to render them in a contextsensitive entry point to information. The key to our approach is a model of competition for attention between software agents, the outcome of which is reflected in a weighted topic structure, annotated with text templates. The annotated topic structure is the basis for generating a context-sensitive navigation node by a process of template expansion and aggregation.

Automatic Authoring and Construction of Hypermedia for Information Retrieval

Multimedia Systems, 1995

This paper describes the complete process and a tool for the automatic construction of a multimedia hypertext starting from a large collection of multimedia documents. Through the use of an authoring methodology, the document collection is automatically authored, and the result is a multimedia hypertext, also called a hypermedia, written in hypertext mark-up language (HTML), almost a standard among hypermedia mark-up languages. The resulting hypermedia can be browsed and queried with Mosaic, an interface developed in the framework of the World Wide Web Project. In particular, the set of methods and techniques used for the automatic construction of hypermedia is described in this paper, and their relevance in the context of multimedia information retrieval is highlighted.