Toward an Enhancement of Textual Database Retrieval By using NLP Techniques * (original) (raw)

Toward an Enhancement of Textual Database Retrieval Using NLP Techniques

Lecture Notes in Computer Science, 2001

Improvements in hardware, communication technology and database have led to the explosion of multimedia information repositories. In order to provide the quality of information retrieval and the quality of services, it is necessary to consider both retrieval techniques and database architecture.

Information Retrieval on the World Wide Web

IEEE Internet Computing, 1997

T he World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months.

A Hypertext Environment for Interacting with Large Textual Databases

Information Processing and Management, 1992

paper presents a design and implementation project based on a two-level conceptual architecture for the construction of a hypertext environment for interacting with large textual databases. The conceptual architecture has been proposed to be used for a semantic representation of the informative content of a collection of documents and for the organisation of the document collection itself. The hypertext environment is based on a set of functions that permits one to exploit the potential capabilities of the two-level architecture. Those functions are presented in detail. The paper reports some results of a more general project whose final goal is the definition of a new model for information retrieval: a model with information retrieval capabilities embedded within a hypertext environment. Finally, an outline is presented of the characteristics of a prototype, named HYPERLINE, of the hypertext environment. This prototype has been developed by the Information Retrieval Service of the European Space Agency (ESA-IRS)

Information Retrieval System and challenges with Dataspace

International Journal of Computer Applications, 2016

The advance of technology has seen increase in applications that integrate new kinds of information, such as multimedia and scientific data, unstructured, semi-structured, structured or heterogeneous data being created and stored is exploding is collectively called "Dataspace". Data being generated from various heterogeneous sources like, digital images, audio, video , online transactions, online social media , data from sensor nodes , click streams for different domains including, retails, medical , healthcare , energy, and day to day life utilities. Information Retrieval from heterogeneous information systems is required but challenging at the same as data is stored and represented in different data models in different information systems. Information integrated from heterogeneous data sources into single data source are faced upon by major challenge of information transformation were in different formats and constraints in data transformation are used in data integration for the purpose of integrating information systems, at the same is not cost effective. Information retrieval from heterogeneous data sources remains a challenging issue, as the number of data sources increases more intelligent retrieval techniques, focusing on information content and semantics, are required. This paper describes the idea of Information retrieval system, Information integration which can be used in the Dataspace and heterogeneous data problems over the web.

Information Retrieval Technique for Web Using NLP

International Journal on Natural Language Computing, 2017

Information retrieval is becoming an intricate part of every domain. Be it in acquiring data from various sources to form a single unit or to present the data in such a way that anyone can extract useful information and hence used in data analysis, data mining etc. This arena has gained much importance in the recent years because as of today we are exploded with various kind of information from the real-world. The growing importance of research data and retrieving the intelligent data are the main focus for any business today. So coming years this is a field where major work need to be done. We have focused here to implement a system for information retrieval from the webpages using Natural Language Processing (NLP) and have shown to getting better results than the existing system. Webpages is a home to huge amount of information from various entities in the real-world. Here we have designed a system for information retrieval technique for web using NLP where techniques Hierarchical Conditional Random Fields (i.e. HCRF) and extended Semi-Markov Conditional Random Fields (i.e. Semi-CRF) along with Visual Page Segmentation is used to get the accurate results. Also parallel processing is used to achieve the results in desired time frame. It further improves the decision making between HCRF and Semi-CRF by using bidirectional approach rather than top-down approach. It enables better understanding of the content and page structure.

Information retrieval in distributed hypertexts

Proceedings of the 4th RIAO …, 1994

Hypertext is a generalization of the conventional linear text into a non-linear text formed by adding cross-reference and structural links between different pieces of text. A hypertext can be regarded as an extension of a textual database by adding a link structure among the different text objects it stores. We present a tool for finding information in a distributed hypertext such as the World-Wide Web (WWW). Such a hypertext is a distributed textual database in which text objects residing at (the same and) different sites have links to each other. In such a database retrieval is limited to the transfer of documents with a known name. Names of documents serve as links between different documents, and finding such references names is only possible by parsing documents that have embedded links to other documents.

WWW Search Systems Using SQL*TextRetrieval and Parallel Server for Structured and Unstructured Data

1996

We describe our experience in developing Web Search Systems using Oracle's SQL*TextRetrieval. In the prototype system we store on-line books in the HTML and the HTML documents of a web site, SQL*TextRetrieval is used to index full text and other structured data in the 'web space' and to provide an efficient search engine for free-text search. The Web enables global access to and maximum information sharing with a hypertext-based text retrieval system. Using Oracle's Free Text Retrieval technology various search options are implemented, including basic word stemming, phrase, fuzzy, and soundex searching, as well as more advanced proximity search and concept search. For the concept search option, we have integrated a public domain "Roget Thesaurus" into our text search system to support synonym expansions. An advanced search mechanism to recursively refine search domain via the web is also described.