Distributed Storage System Research Papers (original) (raw)

—We consider the problem of private information retrieval (PIR) over a distributed storage system. The storage system consists of N non-colluding databases, each storing an MDS-coded version of M messages. In the PIR problem, the user... more

—We consider the problem of private information retrieval (PIR) over a distributed storage system. The storage system consists of N non-colluding databases, each storing an MDS-coded version of M messages. In the PIR problem, the user wishes to retrieve one of the available messages without revealing the message identity to any individual database. We derive the information-theoretic capacity of this problem, which is defined as the maximum number of bits of the desired message that can be privately retrieved per one bit of downloaded information. We show that the PIR capacity in this case is C = 1 + K N + K 2 N 2 + · · · + K M −1 N M −1 −1 = (1 + Rc + R 2 c + · · · + R M −1 c) −1 = 1−Rc 1−R M c , where Rc is the rate of the (N, K) code used. The capacity is a function of the code rate and the number of messages only regardless of the explicit structure of the storage code. The result implies a fundamental tradeoff between the optimal retrieval cost and the storage cost. The result generalizes the achievability and converse results for the classical PIR with replicating databases to the case of coded databases.

As peer-to-peer and,widely distributed storage systems proliferate, the need to perform efficient erasure coding, instead of replication, is crucial to performance and ef- ficiency. Low-Density Parity-Check (LDPC) codes have arisen as... more

As peer-to-peer and,widely distributed storage systems proliferate, the need to perform efficient erasure coding, instead of replication, is crucial to performance and ef- ficiency. Low-Density Parity-Check (LDPC) codes have arisen as alternatives to standard erasure codes, such as Reed-Solomon codes, trading off vastly improved decod- ing performance,for inefficiencies in the amount,of data that must be acquired to perform,decoding. The

Abstract[1] In the face of global change, which is characterized by growing water demands and increasingly variable water supplies, the equitable sharing of water and the drought proofing of rural livelihoods will require an increasing... more

Abstract[1] In the face of global change, which is characterized by growing water demands and increasingly variable water supplies, the equitable sharing of water and the drought proofing of rural livelihoods will require an increasing physical capacity to store water. This is especially true for the semiarid and dry subhumid regions of sub-Saharan Africa and Asia. This paper addresses the following question: What criteria should policymakers apply in choosing between centralized storage capacity in the form of conventional large reservoirs and large interbasin water transfer schemes and decentralized and distributed storage systems in the farmers' fields and in microwatersheds and villages (tanks, microdams, and aquifers)? This exploratory paper uses an interdisciplinary framework encompassing the natural and social sciences to develop four indicators that are considered critical for understanding the biochemical, physical, economic, and sociopolitical dimensions of the scale issues underlying the research question. These are the residence time of water in a reservoir, the water provision capacity, the cost effectiveness of providing reliable access to water per beneficiary, and the equity dimension: maximizing the number of beneficiaries and compensating the losers. The procedural governance challenges associated with each indicator are dealt with separately. It is concluded that water storage and the institutional capacity to effectively administer it are recursively linked. This implies that if the scale of new storage projects gradually increases, a society can progressively learn and adapt to the increasing institutional complexity.

The paper deals with the optimal sizing and allocation of dispersed generation, distributed storage systems and capacitor banks. The optimization aims at minimizing the sum of the costs sustained by the distributor for the power losses,... more

The paper deals with the optimal sizing and allocation of dispersed generation, distributed storage systems and capacitor banks. The optimization aims at minimizing the sum of the costs sustained by the distributor for the power losses, for network upgrading, for carrying out the reactive power service and the costs of storage and capacitor installation, over a planning period of several years. A hybrid procedure based on a genetic algorithm and a sequential quadratic programming-based algorithm was used. A numerical application on a 18-busbar MV balanced 3-phase network was performed in order to show the feasibility of the proposed procedure. be used to reduce the variability of some DG sources, to counter the voltage rise effect or to improve the power quality in distribution networks. Moreover, an optimal control of DESSs allows the operators of electrical distribution systems to improve the reactive control and, as a consequence, to reduce the overall costs (3-6). Considering an...

In a distributed storage system, client caches managed on the basis of small granularity objects can provide better memory utilization then page-based caches. However, ob-ject servers, unlike page servers, must perform additional disk... more

In a distributed storage system, client caches managed on the basis of small granularity objects can provide better memory utilization then page-based caches. However, ob-ject servers, unlike page servers, must perform additional disk reads. These installation reads are required to ...

The data generated by scientific simulations and experimental facilities is beginning to revolutionize the infrastructure support needed by these applications. The on-demand aspect and flexibility of cloud computing... more

The data generated by scientific simulations and
experimental facilities is beginning to revolutionize the
infrastructure support needed by these applications. The on-demand aspect and flexibility of cloud computing
environments makes it an attractive platform for data-intensive scientific applications. However, cloud
computing poses unique challenges for these applications.
For example, cloud computing environments are
heterogeneous, dynamic and non-persistent which can
make reproducibility a challenge. The volume, velocity,
variety, veracity and value of data combined with the
characteristics of cloud environment make it important to
track the state of execution data and application’s entire
lifetime information to understand and ensure
reproducibility. This paper proposes and implements a state
management system (FRIEDA-State) for high-throughput
and data-intensive scientific applications running in cloud
environments. Our design addresses the challenges of state
management in cloud environments and offers various
configurations. Our implementation is built on top of
FRIEDA (Flexible Robust Intelligent Elastic Data
Management), a data management and execution
framework for cloud environments. Our experiment results
on two cloud test beds (FutureGrid and Amazon) show that
the proposed solution has a minimal overhead
(1.2ms/operation at a scale of 64 virtual machines) and is
suitable for state management in cloud environments.

Ability to accommodate all types of distributed storage options and renewable energy sources is one of main characteristics of smart grid. Smart grid integrates advanced sensing technologies, control methodologies and communication... more

Ability to accommodate all types of distributed storage options and renewable energy sources is one of main characteristics of smart grid. Smart grid integrates advanced sensing technologies, control methodologies and communication technologies into current power distribution systems to provide electricity to customers in a better way. Infrastructure for implementation and utilization of renewable energy sources requires distributed storage systems with high power density and high energy density. Currently, some research investigates energy management and dynamic control of distributed storage system to offer not only high power density and high energy density storage but also high efficiency and long life systems. In this paper, an intelligent energy management system is proposed to provide short-term requirements of distributed storage system in smart grid. The energy management of a distributed storage system is formulated as a nonlinear mixed-integer optimization problem. A hybrid algorithm that is combined an evolutionary algorithm with a linear programming was developed to solve the problem. Outcomes of simulation studies show the potential of solving the problem by the proposed algorithm.

... Durability in P2P storage systems was first studied by Blake and Rodrigues [10], who presented a lower bound on node lifetime as a function of node capacity and bandwidth. ... [7] indepen-dently suggested to use Markov chains to... more

... Durability in P2P storage systems was first studied by Blake and Rodrigues [10], who presented a lower bound on node lifetime as a function of node capacity and bandwidth. ... [7] indepen-dently suggested to use Markov chains to predict the probability of data survival in DHTs. ...

Currently, vehicles are equipped with forward facing cameras to assist the forensic investigations of events by proactive image capturing from streets and roads. With content redundancy and storage imbalance in this in-network distributed... more

Currently, vehicles are equipped with forward facing cameras to assist the forensic investigations of events by proactive image capturing from streets and roads. With content redundancy and storage imbalance in this in-network distributed storage system, how to maximize its storage capacity is a challenge. In other words, how to maximize the average lifetime of sensory data (i.e. images generated by cameras) in network is a fundamental problem need to be solved. This paper presents, VStore, a cooperative storage solution for mobile surveillance in vehicular sensor networks (VSN). The mechanisms in VStore are designed for redundancy elimination by exchanging information between vehicles and storage balancing. Compared with previous work, we deal with new challenges in mobile scenario. Field testing was carried out on a real-trace driven simulator, which utilizes about 500 taxies in Shanghai city. The testing results show that VStore can largely prolong the average lifetime of sensory data by cooperative storage.

Elastic distributed storage systems have been increasingly studied in recent years because power consumption has become a major problem in data centers. Much progress has been made in improving the agility of resizing small-and... more

Elastic distributed storage systems have been increasingly studied in recent years because power consumption has become a major problem in data centers. Much progress has been made in improving the agility of resizing small-and large-scale distributed storage systems. However, most of these studies focus on metadata based distributed storage systems. On the other hand, emerging consistent hashing based distributed storage systems are considered to allow better scalability and are highly attractive. We identify challenges in achieving elasticity in consistent hashing based distributed storage. These challenges cannot be easily solved by techniques used in current studies. In this paper, we propose an elastic consistent hashing based distributed storage to solve two problems. First, in order to allow a distributed storage to resize quickly, we modify the data placement algorithm using a primary server design and achieve an equal-work data layout. Second, we propose a selective data reintegration technique to reduce the performance impact when resizing a cluster. Our experimental and trace analysis results confirm that our proposed elastic consistent hashing works effectively and allows significantly better elasticity.

Nowadays, distributed storage is adopted to alleviate Delay-tolerant networking (DTN) congestion, but reliability transmission during the congestion period remains an issue. In this paper, we propose a multi-custodians distributed storage... more

Nowadays, distributed storage is adopted to alleviate Delay-tolerant networking (DTN) congestion, but reliability transmission during the congestion period remains an issue. In this paper, we propose a multi-custodians distributed storage (MCDS) framework that includes a set of algorithms to determine when should appropriate bundles be migrated (or retrieved) to (or from) the suitable custodians, so that we can solve DTN congestion and improve reliability transmission simultaneously. MCDS adopts multiple custodians to temporarily store the duplications of migrated bundle for releasing DTN congestion. Thus, MCDS has more opportunities to retrieve the migrated bundles when network congestion is mitigated. Two performance metrics are used to evaluate the simulation results: the goodput ratio (GR) that represents QoS of data transmission, and the retrieved loss ratio (RLR) that reflects the performance of reliability transmission. We also use another distributed storage mechanism based on single-custodian distributed storage (SCDS) to evaluate MCDS. Simulation results show that MCDS has better GR and RLR in almost all simulation cases. For various scenarios, the GR and RLR of MCDS are in the range of 10.6%-18.4% and 23.2%-36.8%, respectively, which are higher than those of SCDS.