Signature files: an access method for documents and its analytical performance evaluation (original) (raw)
Related papers
The Design of Text Signatures for Text Retrieval Systems
Signature files are one technique for indexing documents for full-text retrieval systems. This paper discusses two methods for generating text signatures -- the word fragmentation and the pseudo-random generation techniques. The paper evaluates the effectiveness and efficiency of generating text signatures using these techniques. It also determines the optimal set of characteristics that define a text signature that is to be used for superimposed signature file indexes. The optimal set of characteristics can be used to create text signatures that minimise the number of false drops retrieved from the information system. Keywords Full-text retrieval; Searching; Signature Files; Superimposed coding; Text retrieval systems; Text signatures. Page 1 1. Introduction A text retrieval system is characterised by two components. The text database consists of a collection of text documents. The documents can either be unstructured (that is, devoid of any of the traditional database field str...
Use of text signatures for document retrieval in a highly parallel environment
Parallel Computing, 1987
This paper considers the use of text signatures, fixed-length bit string representations of document content, in an experimental information retrieval system: such signatures may be generated from the list of keywords characterisin8 a document or a query. A file of docttments may be searched in a bit-serial parallel computer, such as the ICL Distributed Array Processor, usin8 a two-level retriev~ strategy in which a comparison of a query signature with the file of document sisnatures provides a simple and efficient means of identifyin 8 those few documents that need to undergo a computationally demandin8, character matching search. Text retrieval experiments using three larse collections of documents and queries demonstrate the efficiency" of the suggested approach.
Perfect Encoding: a Signature Method for Text Retrieval
Proceedings, International Workshop on Advances in Databases and Information Systems (ADBIS), 1996
A new methodology is introduced, where blocks of text are replaced by a compressed, fully reversible, signature pattern. Full reversibility implies zero information loss, thus the new method is termed Perfect Encoding. The method's analytical model is produced and, where applicable, contrasted with the current practice in signature file organizations. Analysis results indicate that it comprises a potential candidacy for information retrieval implementations. In particular, perfect encoding has the potential to ...
Combining Pat-Trees and Signature Files for Query Evaluation in Document Databases
Lecture Notes in Computer Science, 1999
In this paper, a new indexing technique to support the query evaluation in document databases is proposed. The key idea of the method is the combination of the technique of pat-trees with signature files. While the signature files are built to expedite the traversal of object hierarchies, the pat-trees are constructed to speed up both the signature file searching and the text scanning. In this way, high performance can be achieved.
Searching the World-Wide Web Using Signature Files
Computer Science and EE Dept., University of …, 1995
A problem conamonly faced by users of the World-Wide Web (WWW) is forgetting the path traversed to reach a previously read document. SWISS (Seeking World-Wide Web Information Using a Signature File Search) is a system designed to alleviate this' lost ...
Signature-based document retrieval
2003
This paper presents a new approach for document image decomposition and retrieval based on connected component analysis and geometric properties of the labelled regions. The database contains document images with Arabic/Persian text combined with English text, headlines, ruling lines, trade mark and signature. In particular, Arabic/Persian signature extraction is investigated using special characteristics of the signature that is fairly different from English signatures. A set of efficient, invariant and compact features is extracted for validation purposes using angularradial partitioning of the signature region. Experimental results show the robustness of the proposed method.
Signature files and signature trees
Information Processing Letters, 2002
The signature file method is a popular indexing technique used in information retrieval and databases. It excels in efficient index maintenance and lower space overhead. However, it suffers from inefficiency in query processing due to the fact that for each query processed the entire signature file needs to be scanned. In this paper, we introduce a tree structure, called a signature tree, established over a signature file, which can be used to expedite the signature file scanning by one order of magnitude or more.
Comparison Between Inverted and Signature Files Based on Arabic Documents
2007
The purpose of this research is to give an idea about inverted files and signature files based on Arabic documents collection, and to give the comparison points between the two techniques and the performance of the two techniques on each of the comparison points. The most common measures of system performance used to compare the information retrieval mechanisms are time, space, and recall/precision evaluation measurements. The shorter the response time is, the smaller the space used, the better system is considered to be [1], so our comparisons point will include space overhead, search time, and average recall/precision . In this research, two indices will be built, inverted-file and signature-file. However, to measure the performance of each one, a retrieval system must be built to compare the results of using these indices. A collection of 242 Arabic Abstracts from the proceeding of the Saudi Arabian National Computer Conferences have been used in the two systems, and a collection...