Peter Alvaro | University of California, Berkeley (original) (raw)

Papers by Peter Alvaro

Large-scale distributed systems must be built to anticipate and mitigate a variety of hardware an... more Large-scale distributed systems must be built to anticipate and mitigate a variety of hardware and software failures. In order to build confidence that fault-tolerant systems are correctly implemented, Netflix (and similar enterprises) regularly run failure drills in which faults are deliberately injected in their production system. The combinatorial space of failure scenarios is too large to explore exhaustively. Existing failure testing approaches either randomly explore the space of potential failures randomly or exploit the " hunches " of domain experts to guide the search. Random strategies waste resources testing " uninteresting " faults, while programmer-guided approaches are only as good as human intuition and only scale with human effort. In this paper, we describe how we adapted and implemented a research prototype called lineage-driven fault injection (LDFI) to automate failure testing at Netflix. Along the way, we describe the challenges that arose adapting the LDFI model to the complex and dynamic realities of the Net-flix architecture. We show how we implemented the adapted algorithm as a service atop the existing tracing and fault injection infrastructure, and present early results.

Failure is always an option; in large-scale data management systems , it is practically a certain... more Failure is always an option; in large-scale data management systems , it is practically a certainty. Fault-tolerant protocols and components are notoriously difficult to implement and debug. Worse still, choosing existing fault-tolerance mechanisms and integrating them correctly into complex systems remains an art form, and programmers have few tools to assist them. We propose a novel approach for discovering bugs in fault-tolerant data management systems: lineage-driven fault injection. A lineage-driven fault injector reasons backwards from correct system outcomes to determine whether failures in the execution could have prevented the outcome. We present MOLLY, a prototype of lineage-driven fault injection that exploits a novel combination of data lin-eage techniques from the database literature and state-of-the-art satisfiability testing. If fault-tolerance bugs exist for a particular configuration, MOLLY finds them rapidly, in many cases using an order of magnitude fewer executions than random fault injection. Otherwise, MOLLY certifies that the code is bug-free for that configuration .

Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordin... more Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed ar-chitectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present BLAZES, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.

Distributed consistency is a perennial research topic; in recent years it has become an urgent pr... more Distributed consistency is a perennial research topic; in recent years it has become an urgent practical matter as well. The research literature has focused on enforcing various flavors of consistency at the I/O layer, such as linearizability of read/write registers. For practitioners, strong I/O consistency is often impractical at scale, while looser forms of I/O consistency are difficult to map to application-level concerns. Instead, it is common for developers to take matters of distributed consistency into their own hands, leading to application-specific solutions that are tricky to write, test and maintain. In this paper, we agitate for the technical community to shift its attention to approaches that lie between the extremes of I/O-level and application-level consistency. We ground our discussion in early work in the area, including our own experiences building programmer tools and languages that help developers guarantee distributed consistency at the application level. Much remains to be done, and we highlight some of the challenges that we feel deserve more attention.

Distributed programming has become a topic of widespread interest, and many programmers now wrest... more Distributed programming has become a topic of widespread interest, and many programmers now wrestle with tradeoffs between data consistency, availability and latency. Distributed transactions are often rejected as an undesirable tradeoff today, but in the absence of transactions there are few concrete principles or tools to help programmers design and verify the correctness of their applications. We address this situation with the CALM principle, which connects the idea of distributed consistency to program tests for logical monotonicity. We then introduce Bloom, a distributed programming language that is amenable to high-level consistency analysis and encourages order-insensitive programming. We present a prototype implementation of Bloom as a domain-specific language in Ruby. We also propose a program analysis technique that identifies points of order in Bloom programs: code locations where programmers may need to inject coordination logic to ensure consistency. We illustrate these ideas with two case studies: a simple key-value store and a distributed shopping cart service.

Building and debugging distributed software remains extremely difficult. We conjecture that by ad... more Building and debugging distributed software remains extremely difficult. We conjecture that by adopting a data-centric approach to system design and by employing declarative programming languages, a broad range of distributed software can be recast naturally in a data-parallel programming model. Our hope is that this model can significantly raise the level of abstraction for programmers, improving code simplicity, speed of development, ease of software evolution , and program correctness. This paper presents our experience with an initial large-scale experiment in this direction. First, we used the Overlog language to implement a " Big Data " analytics stack that is API-compatible with Hadoop and HDFS and provides comparable performance. Second, we extended the system with complex distributed features not yet available in Hadoop, including high availability, scalability, and unique monitoring and debugging facilities. We present both quantitative and anecdotal results from our experience, providing some concrete evidence that both data-centric design and declarative languages can substantially simplify distributed systems programming.

The Paxos consensus protocol can be specified concisely, but is notoriously difficult to implemen... more The Paxos consensus protocol can be specified concisely, but is notoriously difficult to implement in practice. We recount our experience building Paxos in Overlog, a distributed declarative programming language. We found that the Paxos algorithm is easily translated to declarative logic, in large part because the primitives used in consensus protocol specifications map directly to simple Overlog constructs such as aggregation and selection. We discuss the programming idioms that appear frequently in our implementation, and the applicability of declarative programming to related application domains.

Recent research has explored using Datalog-based languages to express a distributed system as a s... more Recent research has explored using Datalog-based languages to express a distributed system as a set of logical invariants . Two properties of distributed systems proved difficult to model in Datalog. First, the state of any such system evolves with its execution. Second, deductions in these systems may be arbitrarily delayed, dropped, or reordered by the unreliable network links they must traverse. Previous efforts addressed the former by extending Datalog to include updates, key constraints, persistence and events, and the latter by assuming ordered and reliable delivery while ignoring delay. These details have a semantics outside Datalog, which increases the complexity of the language or its interpretation, and forces programmers to think operationally. We argue that the missing component from these previous languages is a notion of time.

Public reporting burden for the collection of information is estimated to average 1 hour per resp... more Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

Proceedings of the …, Jan 1, 2010

ACM SIGOPS …, Jan 1, 2010

Abstract We present BloomUnit, a testing framework for distributed programs written in the Bloom ... more Abstract We present BloomUnit, a testing framework for distributed programs written in the Bloom language. BloomUnit allows developers to write declarative test specifications that describe the input/output behavior of a software module. Test specifications are expressed as Bloom queries over (distributed) execution traces of the program under test. To allow execution traces to be produced automatically, BloomUnit synthesizes program inputs that satisfy user-provided constraints.

Building on recent interest in distributed logic programming, we take a model-theoretic approach ... more Building on recent interest in distributed logic programming, we take a model-theoretic approach to analyzing confluence of asynchronous distributed programs. We begin with a model-theoretic semantics for Dedalus and introduce the ultimate model, which captures non-deterministic eventual outcomes of distributed programs. After showing the question of confluence undecidable for Dedalus, we identify restricted sub-languages that guarantee confluence while providing adequate expressivity.

ABSTRACT The language Dedalus is a Datalog-like language in which distributed computations and ne... more ABSTRACT The language Dedalus is a Datalog-like language in which distributed computations and networking protocols can be programmed, in the spirit of the Declarative Networking paradigm. Whereas recently formal, operational, semantics for Dedalus-like languages have been developed, a purely declarative semantics has been lacking so far.

Page 1. Declarative Semantics for Declarative Networking Jan Van den Bussche Hasselt University, ... more Page 1. Declarative Semantics for Declarative Networking Jan Van den Bussche Hasselt University, Belgium joint work with Tom Ameloot (Hasselt), Peter Alvaro, Joe Hellerstein, Bill Marczak (Berkeley) 1 Page 2. Declarative Networking UC Berkeley SIGMOD 2006: Network Datalog [Loo, Hellerstein, et al.] • use Datalog to program network protocols, eg: – routing (shortest path) – peer-to-peer PODS 2010: Dedalus [Hellerstein et al.] • use Datalog to program clusters: – querying distributed databases – data-oriented cloud computing 2 Page 3.

Abstract Existing parallel dataflow systems are strictly reactive in their optimizations. At best... more Abstract Existing parallel dataflow systems are strictly reactive in their optimizations. At best, such approaches approximate the optimal strategy, missing opportunities to optimize across multiple queries and reschedule queries to improve locality. We propose three techniques that improve query execution performance by utilizing high-level knowledge of the workload. The first technique predictively replicates data to improve aggregate read bandwidth and increase locality.

Abstract While architectures for distributed computing are changing rapidly, techniques for build... more Abstract While architectures for distributed computing are changing rapidly, techniques for building distributed systems have remained stagnant. As distributed computation becomes the common case, traditional techniques for building such systems will become increasingly burdensome, because they force programmers to deal with the mundane details of constructing reliable distributed systems rather than concentrating on the desired computation.

Abstract: In recent years there has been interest in achieving application level consistency crit... more Abstract: In recent years there has been interest in achieving application level consistency criteria without the latency and availability costs of strongly consistent storage infrastructure. A standard technique is to adopt a vocabulary of commutative operations this avoids the risk of inconsistency due to message reordering. A more powerful approach was recently captured by the CALM theorem, which proves that logically monotonic programs are guaranteed to be eventually consistent.

Proceedings of the …, Jan 1, 2010

MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simpl... more MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop and can run unmodified user-defined MapReduce programs.

Proceedings of the …, Jan 1, 2010

ACM SIGOPS …, Jan 1, 2010

Proceedings of the …, Jan 1, 2010