D. Patterson | University of California, Berkeley (original) (raw)
Papers by D. Patterson
A vast body of theoretical research has focused either on overly simplistic models of parallel co... more A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
Modern computer systems are instrumented to generate huge amounts of system log data. This data c... more Modern computer systems are instrumented to generate huge amounts of system log data. This data contains valuable information for managing the system, localizing failures, and recovery. However, the complexity of these systems greatly surpasses what can be understood by human operators and thus automated analysis systems are beginning to be used. Due to preprocessing required by the statistical algorithms, the extremely high volume of data cannot be processed using ad-hoc scripts. We present a flexible, modular and scalable architecture for statistical learning from large data streams that can easily process lots of data. We built a prototype that is evaluated using system log data from a commercial on-line service. Moreover, the results of the analysis were genuinely useful for the on-line service operators. A: class attribute: error-code bytes-served <= 195: 145 (135/9) bytes-served > 195 | R_content-len = yes: 32 (98) | R_content-len != yes | | R_not-cached-reason = yes: 32 (45/19) | | R_not-cached-reason != yes | | | duration <= 15.2 | | | | bytes-received <= 2680: -13 (39) | | | | bytes-received > 2680 | | | | | bytes-received <= 2805: 131 (30/7) | | | | | bytes-received > 2805: -13 (85/13) | | | duration > 15.2: 131 (69/6) _____________ B: class attribute: R error-code R_cache-served = yes: no (10469) R_cache-served != yes | R_server-duration = yes: no (7686) | R_server-duration != yes: yes (18094/5) B': class attribute: R error-code attribute R_cache-served removed duration <= 2.25 | client-write-duration <= 0.0: yes (200/4) | client-write-duration > 0.0 | | bytes-served <= 210 | | | R_server-duration = yes: no (873) | | | R_server-duration != yes: yes (1969/10) | | bytes-served > 210 | | | visit-url = 6520...: yes (69) | | | visit-url != 6520... | | | | visit-url = 2336...: yes (72/1) | | | | visit-url != 2336...: no (18909/1934) duration > 2.25 | R_server-duration = yes: no (291) | R_server-duration != yes: yes (13866/6) _____________ C: class attribute: O client-write-duration duration <= 9.71: false (18018) duration > 9.71: true (18231/71) C': class attribute: O client-write-duration attribute duration removed bytes-served <= 67958 | R_error-code = yes | | R_content-type = yes: true (253/6) | | R_content-type != yes: false (17) | R_error-code != yes | | gmt = 2003-06-24 00:01:07: true (54) | | gmt != 2003-06-24 00:01:07 | | | user-id = 96848766314153157: true (99/6) | | | user-id != 96848766314153157 | | | | gmt = 2003-06-24 02:23:28: true (45) | | | | gmt != 2003-06-24 02:23:28 | | | | | visit-url = 8227...: true (43) | | | | | visit-url != 8227...: false (18005) bytes-served > 67958: true (17733/55)
Despite significant efforts in the field of Autonomic Com-puting, system operators will still pla... more Despite significant efforts in the field of Autonomic Com-puting, system operators will still play a critical role in ad-ministering Internet services for many years to come. How-ever, very little is know about how system operators work, what tools they use and how we can make ...
Horizontally-scalable Internet services on clusters of commodity computers appear to be a great f... more Horizontally-scalable Internet services on clusters of commodity computers appear to be a great fit for automatic control: there is a target output (service-level agreement), observed output (actual latency), and gain controller (adjusting the number of servers). Yet few datacenters are automated this way in practice, due in part to well-founded skepticism about whether the simple models often used in the research literature can capture complex real-life workload/performance relationships and keep up with changing conditions that might invalidate the models. We argue that these shortcomings can be fixed by importing modeling, control, and analysis techniques from statistics and machine learning. In particular, we apply rich statistical models of the application's performance, simulation-based methods for finding an optimal control policy, and change-point methods to find abrupt changes in performance. Preliminary results running a Web 2.0 benchmark application driven by real workload traces on Amazon's EC2 cloud show that our method can effectively control the number of servers, even in the face of performance anomalies.
Proceedings of the 6th annual symposium on Computer architecture - ISCA '79, 1979
X-NODE is a single-chip VLSI processor to be realized in the mid 1980&amp;#39;s and to be... more X-NODE is a single-chip VLSI processor to be realized in the mid 1980&amp;#39;s and to be used as a building block for a tree-structured multiprocessor system (X-TREE). Three major trends influence the design of this processor: the continuing evolution of VLSI technology, the requirements for parallelism and communication in a multiprocessor system, and the need for better support of software
25 years of the international symposia on Computer architecture (selected papers) - ISCA '98, 1998
The Reduced Instruction Set Computer (RISC) Project investigates an alternative to the general tr... more The Reduced Instruction Set Computer (RISC) Project investigates an alternative to the general trend toward computers with increasingly complex instruction sets: With a proper set of instructions and a corresponding architectural design, a machine with a high effective throughput can be achieved. The simplicity of the instruction set and addressing modes allows most instructions to execute in a single machine cycle, and the simplicity of each instruction guarantees a short cycle time. In addition, such a machine should have a much shorter design time.
ACM SIGARCH Computer Architecture News, 1982
... First row: Korbin Van Dyke, Osamu Tomisawa, James Peek, Prof. David Patterson, Prof. Carlo S6... more ... First row: Korbin Van Dyke, Osamu Tomisawa, James Peek, Prof. David Patterson, Prof. Carlo S6quin, Peter Kess-ler; second row: Robert Sherburne, Manolis Katevenis, Prof. John Ousterhout, Ralph Campbell, Richard Piepho, Daniel Fitzpatrick. ...
IEEE Journal of Solid-State Circuits, 1980
In the mid 1980's it will be possible to put a million devices (transistors or active MO.S gate e... more In the mid 1980's it will be possible to put a million devices (transistors or active MO.S gate electrodes) onto a single silicon chip. General trends in the evolution of silicon integrated circuits are reviewed and design constraints for emerging VLSI circuits are analyzed. Desirable architectural features in modern computers are then discussed and consequences for an implementation with large-scale integrated circuits are investigated. The resulting recommended processor design includes features such as an on-chip memory hierarchy, multiple homogeneous caches for enhanced execution parallelism, support for complex data structures and high-level languages, a flexible instruction set, and communication hardware. It is concluded that a viable modular building block for the next generation of computing systems will be a self-contained computer on a single chip. A tentative allocation of the one milion transistors to the various functional blocks is given, and the result is a memory intensive design.
The European Physical Journal D, 1999
Abstract. Over the past three years we have developed the technique of buffer-gas cooling and loa... more Abstract. Over the past three years we have developed the technique of buffer-gas cooling and loading of atoms and molecules into magnetic traps. Buffer-gas cooling relies solely on elastic collisions (thermal-ization) of the species-to-be-trapped with a cryogenically cooled ...
American Journal of Evaluation, 2008
... DOI: 10.1177/1098214008320736 2008 29: 369 American Journal of Evaluation Rebecca Campbell, A... more ... DOI: 10.1177/1098214008320736 2008 29: 369 American Journal of Evaluation Rebecca Campbell, Adrienne E. Adams and Debra Patterson Clients/Consumers : A Comparison of Three Methods Methodological Challenges of Collecting Evaluation Data From Traumatized ...
… , University of California at Berkeley, Technical Report No. …
The recent switch to parallel microprocessors is a milestone in the history of computing. Industr... more The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation.
A vast body of theoretical research has focused either on overly simplistic models of parallel co... more A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
Modern computer systems are instrumented to generate huge amounts of system log data. This data c... more Modern computer systems are instrumented to generate huge amounts of system log data. This data contains valuable information for managing the system, localizing failures, and recovery. However, the complexity of these systems greatly surpasses what can be understood by human operators and thus automated analysis systems are beginning to be used. Due to preprocessing required by the statistical algorithms, the extremely high volume of data cannot be processed using ad-hoc scripts. We present a flexible, modular and scalable architecture for statistical learning from large data streams that can easily process lots of data. We built a prototype that is evaluated using system log data from a commercial on-line service. Moreover, the results of the analysis were genuinely useful for the on-line service operators. A: class attribute: error-code bytes-served <= 195: 145 (135/9) bytes-served > 195 | R_content-len = yes: 32 (98) | R_content-len != yes | | R_not-cached-reason = yes: 32 (45/19) | | R_not-cached-reason != yes | | | duration <= 15.2 | | | | bytes-received <= 2680: -13 (39) | | | | bytes-received > 2680 | | | | | bytes-received <= 2805: 131 (30/7) | | | | | bytes-received > 2805: -13 (85/13) | | | duration > 15.2: 131 (69/6) _____________ B: class attribute: R error-code R_cache-served = yes: no (10469) R_cache-served != yes | R_server-duration = yes: no (7686) | R_server-duration != yes: yes (18094/5) B': class attribute: R error-code attribute R_cache-served removed duration <= 2.25 | client-write-duration <= 0.0: yes (200/4) | client-write-duration > 0.0 | | bytes-served <= 210 | | | R_server-duration = yes: no (873) | | | R_server-duration != yes: yes (1969/10) | | bytes-served > 210 | | | visit-url = 6520...: yes (69) | | | visit-url != 6520... | | | | visit-url = 2336...: yes (72/1) | | | | visit-url != 2336...: no (18909/1934) duration > 2.25 | R_server-duration = yes: no (291) | R_server-duration != yes: yes (13866/6) _____________ C: class attribute: O client-write-duration duration <= 9.71: false (18018) duration > 9.71: true (18231/71) C': class attribute: O client-write-duration attribute duration removed bytes-served <= 67958 | R_error-code = yes | | R_content-type = yes: true (253/6) | | R_content-type != yes: false (17) | R_error-code != yes | | gmt = 2003-06-24 00:01:07: true (54) | | gmt != 2003-06-24 00:01:07 | | | user-id = 96848766314153157: true (99/6) | | | user-id != 96848766314153157 | | | | gmt = 2003-06-24 02:23:28: true (45) | | | | gmt != 2003-06-24 02:23:28 | | | | | visit-url = 8227...: true (43) | | | | | visit-url != 8227...: false (18005) bytes-served > 67958: true (17733/55)
Despite significant efforts in the field of Autonomic Com-puting, system operators will still pla... more Despite significant efforts in the field of Autonomic Com-puting, system operators will still play a critical role in ad-ministering Internet services for many years to come. How-ever, very little is know about how system operators work, what tools they use and how we can make ...
Horizontally-scalable Internet services on clusters of commodity computers appear to be a great f... more Horizontally-scalable Internet services on clusters of commodity computers appear to be a great fit for automatic control: there is a target output (service-level agreement), observed output (actual latency), and gain controller (adjusting the number of servers). Yet few datacenters are automated this way in practice, due in part to well-founded skepticism about whether the simple models often used in the research literature can capture complex real-life workload/performance relationships and keep up with changing conditions that might invalidate the models. We argue that these shortcomings can be fixed by importing modeling, control, and analysis techniques from statistics and machine learning. In particular, we apply rich statistical models of the application's performance, simulation-based methods for finding an optimal control policy, and change-point methods to find abrupt changes in performance. Preliminary results running a Web 2.0 benchmark application driven by real workload traces on Amazon's EC2 cloud show that our method can effectively control the number of servers, even in the face of performance anomalies.
Proceedings of the 6th annual symposium on Computer architecture - ISCA '79, 1979
X-NODE is a single-chip VLSI processor to be realized in the mid 1980&amp;#39;s and to be... more X-NODE is a single-chip VLSI processor to be realized in the mid 1980&amp;#39;s and to be used as a building block for a tree-structured multiprocessor system (X-TREE). Three major trends influence the design of this processor: the continuing evolution of VLSI technology, the requirements for parallelism and communication in a multiprocessor system, and the need for better support of software
25 years of the international symposia on Computer architecture (selected papers) - ISCA '98, 1998
The Reduced Instruction Set Computer (RISC) Project investigates an alternative to the general tr... more The Reduced Instruction Set Computer (RISC) Project investigates an alternative to the general trend toward computers with increasingly complex instruction sets: With a proper set of instructions and a corresponding architectural design, a machine with a high effective throughput can be achieved. The simplicity of the instruction set and addressing modes allows most instructions to execute in a single machine cycle, and the simplicity of each instruction guarantees a short cycle time. In addition, such a machine should have a much shorter design time.
ACM SIGARCH Computer Architecture News, 1982
... First row: Korbin Van Dyke, Osamu Tomisawa, James Peek, Prof. David Patterson, Prof. Carlo S6... more ... First row: Korbin Van Dyke, Osamu Tomisawa, James Peek, Prof. David Patterson, Prof. Carlo S6quin, Peter Kess-ler; second row: Robert Sherburne, Manolis Katevenis, Prof. John Ousterhout, Ralph Campbell, Richard Piepho, Daniel Fitzpatrick. ...
IEEE Journal of Solid-State Circuits, 1980
In the mid 1980's it will be possible to put a million devices (transistors or active MO.S gate e... more In the mid 1980's it will be possible to put a million devices (transistors or active MO.S gate electrodes) onto a single silicon chip. General trends in the evolution of silicon integrated circuits are reviewed and design constraints for emerging VLSI circuits are analyzed. Desirable architectural features in modern computers are then discussed and consequences for an implementation with large-scale integrated circuits are investigated. The resulting recommended processor design includes features such as an on-chip memory hierarchy, multiple homogeneous caches for enhanced execution parallelism, support for complex data structures and high-level languages, a flexible instruction set, and communication hardware. It is concluded that a viable modular building block for the next generation of computing systems will be a self-contained computer on a single chip. A tentative allocation of the one milion transistors to the various functional blocks is given, and the result is a memory intensive design.
The European Physical Journal D, 1999
Abstract. Over the past three years we have developed the technique of buffer-gas cooling and loa... more Abstract. Over the past three years we have developed the technique of buffer-gas cooling and loading of atoms and molecules into magnetic traps. Buffer-gas cooling relies solely on elastic collisions (thermal-ization) of the species-to-be-trapped with a cryogenically cooled ...
American Journal of Evaluation, 2008
... DOI: 10.1177/1098214008320736 2008 29: 369 American Journal of Evaluation Rebecca Campbell, A... more ... DOI: 10.1177/1098214008320736 2008 29: 369 American Journal of Evaluation Rebecca Campbell, Adrienne E. Adams and Debra Patterson Clients/Consumers : A Comparison of Three Methods Methodological Challenges of Collecting Evaluation Data From Traumatized ...
… , University of California at Berkeley, Technical Report No. …
The recent switch to parallel microprocessors is a milestone in the history of computing. Industr... more The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation.