A fast retrieval method for local or distributed data (original) (raw)

1A Fast Retrieval Method for Local or Distributed Data

2016

Abstract—In this paper, we propose an improvement to an approach to data retrieval which is performed in only one access to a bucket hash table or file. The idea behind it, is to let the system assign one digit to the record key so that the hashed new record key is "forced " to fall in a bucket according to some practical criteria. From a user point of view this forced hash procedure could be thought of as a “user-system cooperating code assignment”, since the user is free to code an object to be retrieved but the system may append s a digit to that code. For one access retrieval purposes, the new code key-digit is used to find its address. However, should the digit is not known, the retrieval process will find the key in its surrounding, provided it exists. In this approach it is unnecessary a bucket overflow area of any kind, since this method allows a high load factor for practical use. In the event of the hash table is nearly full, a simple procedure could be ran to ex...

Enhanced Distributed Hash Tables for Complex Queries

2006 1st International Conference on Communication Systems Software & Middleware, 2006

Peer-to-peer file sharing systems have become a very popular way of sharing large number of files over a distributed environment. One of the principal ingredients of such systems is a lookup service which maps a key denoting a file to a location storing the file. Dynamic hash tables (DHT's) were recently proposed as a means of supporting such a lookup service in a completely distributed manner. They have many desirable properties, but suffer from one serious drawback -in order to locate a file, we must have a precise knowledge of the key representing it. In this paper, we propose a lookup service which supports complex queries and has all the advantages of DHT's. We also compare our proposed method with PIER [8], another recently proposed peer-to-peer system for answering complex queries. Our experiments show that our method results in better utilization of the network than PIER.

Extendible hashing---a fast access method for dynamic files

ACM Transactions on Database Systems, 1979

Extendible hashing is a new access technique, in which the user is guaranteed no more than two page faults to locate the data associated with a given unique identifier, or key. Unlike conventional hashing, extendible hashing has a dynamic structure that grows and shrinks gracefully as the database grows and shrinks. This approach simultaneously solves the problem of making hash tables that are extendible and of making radix search trees that are balanced. We study, by analysis and simulation, the performance of extendible hashing. The results indicate that extendible hashing provides an attractive alternative to other access methods, such as balanced trees.

An Improved Technique for Data Retrieval in Distributed Systems

2019

In current times the world is moving towards the distributed systems, which work with the concept of reliability, availability and performance, as data is stored on multiple sites in distributed manner. If system on one location fails then the data can be accessed from some other storage location, so, the data is easily available in distributed system. In such scenario information retrieval is a big issue, Data might be retrieved by misappropriate user, which might spillover the performance of system as an increase in data retrieval time. In this research work, an improved technique will be designed, by using data replication which is mostly used to manage big volumes of data in a distributed manner. It speeds up data retrieval, reduces data retrieval time and increases data availability, to cope with the issue of data retrieval time. The performance of developed system will be analyzed by giving multiple queries at a time from different systems, which together makes a distributed e...

Linear hashing with separators—a dynamic hashing scheme achieving one-access

ACM Transactions on Database Systems, 1988

A new dynamic hashing scheme is presented. Its most outstanding feature is that any record can be retrieved in exactly one disk access. This is achieved by using a small amount of supplemental internal storage that stores enough information to uniquely determine the current location of any record. The amount of internal storage required is small: typically one byte for each page of the file. The necessary address computation, insertion, and expansion algorithms are presented and the performance is studied by means of simulation. The new method is the first practical method offering one-access retrieval for large dynamic files.

Fast search in main memory databases

ACM SIGMOD Record, 1992

The objective of this paper is to develop and analyze high performance hash based search methods for main memory databases. We define optimal search in main memory databases as the search that requires at most one key comparison to locate a record. Existing hashing techniques become impractical when they are adapted to yield optimal search in main memory databases because of their large directory size. Multi-directory hashing techniques can provide significantly improved directory utilization over single-directory hashing techniques. A multi-directory hashing scheme, called fast search multi-directory hashing, and its generalization, called controlled search multi-directory hashing, are presented. Both methods achieve linearly increasing expected directory size with the number of records. Their performance is compared to existing alternatives.

A Novel Architecture for Mobile Distributed Trie Hashing System

2008

Scalable and Distributed Data Structures (SDDS) are a class of data structures completely dedicated to distributed environments. They allow the management of large amounts of data while maintaining steady and optimum performances. Several families of SDDS have been proposed: LH*, RP*, DRT*, CTH*. None of these SDDS deals with the mobile environment. In this paper we present a novel architecture that uses a scalable and distributed data structure to manage insert/find/range query operations for mobile clients. We describe the design and the implementation of a mobile CTH* prototype. Our experimental results prove the validity of the design choices and show interesting access performances. The capabilities of the mobile CTH* platform offer new perspectives for high performance and ubiquitous data intensive applications.

The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems, 1984

Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficiencies in particular for multikey access to highly dynamic files. We study the dynamic aspects of tile structures that treat all keys symmetrically, that is, file structures which avoid the distinction between primary and secondary keys. We start from a bitmap approach and treat the problem of file design as one of data compression of a large sparse matrix. This leads to the notions of a grid partition of the search space and of a grid directory, which are the keys to a dynamic file structure called the grid file. This tile system adapts gracefully to its contents under insertions and deletions, and thus achieves an upper hound of two disk accesses for single record retrieval; it also handles range queries and partially specified queries efficiently. We discuss in detail the design decisions that led to the grid file, present simulation results of its behavior, and compare it to other multikey access file structures.

Distributed Linear Hashing and Parallel Projection in Main Memory Databases

Very Large Data Bases, 1990

This paper extends the concepts of the distributed linear hashed main memory file system with the objective of supporting higher level parallel dambase operations. The basic distributed linear hashing technique provides a high speed hash based dynamic file system on a NUMA atchi- tecture multi-processor system. Distributed linear hashing has been extended to include the ability to perform high speed

A seven-dimensional analysis of hashing methods and its implications on query processing

Proceedings of the VLDB Endowment, 2015

Hashing is a solved problem. It allows us to get constant time access for lookups. Hashing is also simple. It is safe to use an arbitrary method as a black box and expect good performance, and optimizations to hashing can only improve it by a negligible delta. Why are all of the previous statements plain wrong? That is what this paper is about. In this paper we thoroughly study hashing for integer keys and carefully analyze the most common hashing methods in a five-dimensional requirements space: (1) data-distribution, (2) load factor, (3) dataset size, (4) read/write-ratio, and (5) un/successful-ratio. Each point in that design space may potentially suggest a different hashing scheme, and additionally also a different hash function. We show that a right or wrong decision in picking the right hashing scheme and hash function combination may lead to significant difference in performance. To substantiate this claim, we carefully analyze two additional dimensions: (6) five representati...