Joseph L Hellerstein - Academia.edu (original) (raw)

Papers by Joseph L Hellerstein

Research paper thumbnail of YSCOPE: A Shell for Building Expert Systems for Solving Computer-Performance Problems

Int. CMG Conference, 1985

Research paper thumbnail of Expert Systems in Data Processing: Applications Using IBM's Knowledge Tool

Research paper thumbnail of The CHAMPS system: change management with planning and scheduling

2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507)

Report for early dissemination of its contents. In view of the transfer of copyright to the outsi... more Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T.

Research paper thumbnail of Expert Systems in Data Processing Applications Using IBM Knowledge Tool

Research paper thumbnail of Rules of thumb for selecting metrics for detecting performance problems

Research paper thumbnail of System and Method for Delivering an Integrated Server Administration Platform

Research paper thumbnail of Expert Systems in Data Processing Applications Using IBM Knowledge Tool

Computer Measurement Group Conference, 1989

Research paper thumbnail of Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer

Science Translational Medicine, 2013

An open challenge to model breast cancer prognosis revealed that collaboration and transparency e... more An open challenge to model breast cancer prognosis revealed that collaboration and transparency enhanced the power of prognostic models.

Research paper thumbnail of YES/MVS and the automation of operations for large computer complexes

IBM Systems Journal, 1986

The Yorktown Expert SystemlMVS Manager (known as YESIMVS) is an experimental expert system that a... more The Yorktown Expert SystemlMVS Manager (known as YESIMVS) is an experimental expert system that as- sists with the operation of a large computer complex. The first version of YESIMVS (called YESIMVS I) was used regularly in the computing center of IBM's Thomas J. Watson Research Center for most of a year. Based on the experience gained in developing and us-

Research paper thumbnail of Adapting Modeling and Simulation Credibility Standards to Computational Systems Biology

arXiv (Cornell University), Jan 14, 2023

Computational models are increasingly used in high-impact decision making in science, engineering... more Computational models are increasingly used in high-impact decision making in science, engineering, and medicine. The National Aeronautics and Space Administration (NASA) uses computational models to perform complex experiments that are otherwise prohibitively expensive or require a microgravity environment. Similarly, the Food and Drug Administration (FDA) and European Medicines Agency (EMA) have began accepting models and simulations as form of evidence for pharmaceutical and medical device approval. It is crucial that computational models meet a standard of credibility when using them in high-stakes decision making. For this reason, institutes including NASA, the FDA, and the EMA have developed standards to promote and assess the credibility of computational models and simulations. However, due to the breadth of models these institutes assess, these credibility standards are mostly qualitative and avoid making specific recommendations. On the other hand, modeling and simulation in systems biology is a narrow domain and several standards are already in place. As systems biology models increase in complexity and influence, the development of a credibility assessment system is crucial. Here we review existing standards in systems biology, credibility standards in other science, engineering, and medical fields, and propose the development of a credibility standard for systems biology models. 1 Current Standards in Systems Biology Klipp et al. describe standards as agreed-upon formats used to enhance information exchange and mutual understanding 10. In the field of systems biology, standards are a means to share information about experiments, models, data formats, nomenclature, and graphical representations of biochemical systems. Standardized means of information exchange improve model reuse, expandability, and integration as well as allowing communication between tools. In a survey of 125 systems biologists, most thought of standards as essential to their field, primarily for the purpose of reproducing and checking simulation results, both essential aspects of credibility 10. A multitude of standards exist in systems biology for processes from annotation to dissemination. Although there is

Research paper thumbnail of Building and Optimizing Declarative Networked Systems

Research paper thumbnail of The Role of Quantitative Models in Building Scalable Cloud Infrastructures

2010 Seventh International Conference on the Quantitative Evaluation of Systems, 2010

Research paper thumbnail of Obfuscatory obscanturism: Making workload traces of commercially-sensitive systems safe to release

2012 IEEE Network Operations and Management Symposium, 2012

Cloud providers such as Google are interested in fostering research on the daunting technical cha... more Cloud providers such as Google are interested in fostering research on the daunting technical challenges they face in supporting planetary-scale distributed systems, but no academic organizations have similar scale systems on which to experiment. Fortunately, good research can still be done using traces of real-life production workloads, but there are risks in releasing such data, including inadvertently disclosing confidential or proprietary information, as happened with the Netflix Prize data. This paper discusses these risks, and our approach to them, which we call systematic obfuscation. It protects proprietary and personal data while leaving it possible to answer interesting research questions. We explain and motivate some of the risks and concerns and propose how they can best be mitigated, using as an example our recent publication of a monthlong trace of a production system workload on a 11k-machine cluster.

Research paper thumbnail of Using MIMO feedback control to enforce policies for interrelated metrics with application to the Apache Web server

NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327)

Policy-based management provides a means for IT systems to operate according to business needs. U... more Policy-based management provides a means for IT systems to operate according to business needs. Unfortunately, there is often an "impedance mismatch" between the policies administrators want and the controls they are given. Consider the Apache web server. Administrators want to control CPU and memory utilizations, but this must be done indirectly by manipulating tuning parameters such as MaxClients and KeepAlive. There has been much interest in using feedback control to bridge the impedance mismatch. However, these efforts have focused on a single metric that is manipulated by a single control and hence have not considered interactions between controls such as those that are common in computing systems. This paper shows how multiple-input, multiple-output (MIMO) control theory can be used to enforce policies for interrelated metrics. MIMO is used both to model the target system, Apache in our case, and to design feedback controllers. The MIMO model captures the interactions between KA and MC, and can be used to identify infeasible metric policies. In addition, MIMO control techniques can provide considerable benefit in handling trade-offs between speed of metric convergence and sensitivity to random fluctuations while enforcing the desired policies.

Research paper thumbnail of Optimizing Quality of Service Using Fuzzy Control

Lecture Notes in Computer Science, 2002

The rapid growth of eCommerce increasingly means business revenues depend on providing good quali... more The rapid growth of eCommerce increasingly means business revenues depend on providing good quality of service (QoS) for web site interactions. Traditionally, system administrators have been responsible for optimizing tuning parameters, a process that is time-consuming and skills-intensive, and therefore high cost. This paper describes an approach to automating parameter tuning using a fuzzy controller that employs rules incorporating qualitative knowledge of the effect of tuning parameters. An example of such qualitative knowledge in the Apache web server is "MaxClients has a concave upward effect on response times." Our studies using a real Apache web server suggest that such a scheme can improve performance without human intervention. Further, we show that the controller can automatically adapt to changes in workloads.

Research paper thumbnail of A first-principles approach to constructing transfer functions for admission control in computing systems

Proceedings of the 41st IEEE Conference on Decision and Control, 2002.

This paper develops a first principles approach to constructing parameterized transfer function m... more This paper develops a first principles approach to constructing parameterized transfer function models for an abstraction of admission control, the M/M/1/K queueing system. We linearize this system using the first order model y(k + 1) = ay(k) + bu(k), where y is the output (e.g., number in system) and u is buffer size. The pole a is estimated as the lag 1 autocorrelation of y at steady state, and b is estimated using dy/du. With these analytic models for a and b, we study the effects of workload (i.e., arrival and service rates) and sample times. We show that a and b move in opposite directions at large utilizations, an effect that can have significant implications on closed loop poles. Further, the DC gain for response time and number in system drops to 0 as buffer size increases, and the DC gain of number in system converges to 0.5 as workload intensity becomes large. These insights may aid in designing robust and/or adaptive controllers for computing systems. Last, our models provide insight into why the integral control of a Lotus Notes email server has an oscillatory response to a change in reference value.

Research paper thumbnail of Self-Managing Systems: A Control Theory Foundation

12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS'05)

The high cost of operating large computing installations has motivated a broad interest in reduci... more The high cost of operating large computing installations has motivated a broad interest in reducing the need for human intervention by making systems self-managing. This paper explores the extent to which control theory can provide an architectural and analytic foundation for building self-managing systems, either from new components or layering on top of existing components. Further, we propose a deployable testbed for autonomic computing (DTAC) that we believe will reduce the barriers to addressing key research problems in autonomic computing. The initial DTAC architecture is described along with several problems that it can be used to investigate.

Research paper thumbnail of Generic Online Optimization of Multiple Configuration Parameters with Application to a Database Server

Lecture Notes in Computer Science, 2003

Optimizing configuration parameters is time-consuming and skills-intensive. This paper proposes a... more Optimizing configuration parameters is time-consuming and skills-intensive. This paper proposes a generic approach to automating this task. By generic, we mean that the approach is relatively independent of the target system for which the optimization is done. Our approach uses online adjustment of configuration parameters to discover the system's performance characteristics. Doing so creates two challenges: (1) handling interdependencies between configuration parameters and (2) minimizing the deleterious effects on production workload while the optimization is underway. Our approach addresses (1) by including in the architecture a rule-based component that handles interdependencies between configuration parameters. For (2), we use a feedback mechanism for online optimization that searches the parameter space in a way that generally avoids poor performance at intermediate steps. Our studies of a DB2 Universal Database Server under an e-commerce workload indicate that our approach can be effective in practice.

Research paper thumbnail of Managing the Performance Impact of Administrative Utilities

Lecture Notes in Computer Science, 2003

Administrative utilities (e.g., filesystem and database backups, garbage collection in the Java V... more Administrative utilities (e.g., filesystem and database backups, garbage collection in the Java Virtual Machines) are an essential part of the operation of production systems. Since production work can be severely degraded by the execution of such utilities, it is desirable to have policies of the form "There should be no more than an x% degradation of production work due to utility execution." Two challenges arise in providing such policies: (1) providing an effective mechanism for throttling the resource consumption of utilities and (2) continuously translating from policy expressions of "degradation units" into the appropriate settings for the throttling mechanism. We address (1) by using self-imposed sleep, a technique that forces utilities to slow down their processing by a configurable amount. We address (2) by employing an online estimation scheme in combination with a feedback loop. This throttling system is autonomous and adaptive and allows the system to self-manage its utilities to limit their performance impact, with only high-level policy input from the administrator. We demonstrate the effectiveness of these approaches in a prototype system that incorporates these capabilities into IBM's DB2 Universal Database server.

Research paper thumbnail of Control of large scale computing systems

ACM SIGBED Review, 2006

The rapidly increasing scale of computing systems means that it is vitally important to address t... more The rapidly increasing scale of computing systems means that it is vitally important to address the scaling challenges in the control of computing systems. We introduce a framework for describing the control problems for large scale computing systems that expand along two dimensions: the scale of the target system and the scale of the policy. Using this framework, we present control architectures that span a range from centralized schemes to distributed solutions. We further identify several research challenges related to issues such as target systems latencies and policy decomposition.

Research paper thumbnail of YSCOPE: A Shell for Building Expert Systems for Solving Computer-Performance Problems

Int. CMG Conference, 1985

Research paper thumbnail of Expert Systems in Data Processing: Applications Using IBM's Knowledge Tool

Research paper thumbnail of The CHAMPS system: change management with planning and scheduling

2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507)

Report for early dissemination of its contents. In view of the transfer of copyright to the outsi... more Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T.

Research paper thumbnail of Expert Systems in Data Processing Applications Using IBM Knowledge Tool

Research paper thumbnail of Rules of thumb for selecting metrics for detecting performance problems

Research paper thumbnail of System and Method for Delivering an Integrated Server Administration Platform

Research paper thumbnail of Expert Systems in Data Processing Applications Using IBM Knowledge Tool

Computer Measurement Group Conference, 1989

Research paper thumbnail of Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer

Science Translational Medicine, 2013

An open challenge to model breast cancer prognosis revealed that collaboration and transparency e... more An open challenge to model breast cancer prognosis revealed that collaboration and transparency enhanced the power of prognostic models.

Research paper thumbnail of YES/MVS and the automation of operations for large computer complexes

IBM Systems Journal, 1986

The Yorktown Expert SystemlMVS Manager (known as YESIMVS) is an experimental expert system that a... more The Yorktown Expert SystemlMVS Manager (known as YESIMVS) is an experimental expert system that as- sists with the operation of a large computer complex. The first version of YESIMVS (called YESIMVS I) was used regularly in the computing center of IBM's Thomas J. Watson Research Center for most of a year. Based on the experience gained in developing and us-

Research paper thumbnail of Adapting Modeling and Simulation Credibility Standards to Computational Systems Biology

arXiv (Cornell University), Jan 14, 2023

Computational models are increasingly used in high-impact decision making in science, engineering... more Computational models are increasingly used in high-impact decision making in science, engineering, and medicine. The National Aeronautics and Space Administration (NASA) uses computational models to perform complex experiments that are otherwise prohibitively expensive or require a microgravity environment. Similarly, the Food and Drug Administration (FDA) and European Medicines Agency (EMA) have began accepting models and simulations as form of evidence for pharmaceutical and medical device approval. It is crucial that computational models meet a standard of credibility when using them in high-stakes decision making. For this reason, institutes including NASA, the FDA, and the EMA have developed standards to promote and assess the credibility of computational models and simulations. However, due to the breadth of models these institutes assess, these credibility standards are mostly qualitative and avoid making specific recommendations. On the other hand, modeling and simulation in systems biology is a narrow domain and several standards are already in place. As systems biology models increase in complexity and influence, the development of a credibility assessment system is crucial. Here we review existing standards in systems biology, credibility standards in other science, engineering, and medical fields, and propose the development of a credibility standard for systems biology models. 1 Current Standards in Systems Biology Klipp et al. describe standards as agreed-upon formats used to enhance information exchange and mutual understanding 10. In the field of systems biology, standards are a means to share information about experiments, models, data formats, nomenclature, and graphical representations of biochemical systems. Standardized means of information exchange improve model reuse, expandability, and integration as well as allowing communication between tools. In a survey of 125 systems biologists, most thought of standards as essential to their field, primarily for the purpose of reproducing and checking simulation results, both essential aspects of credibility 10. A multitude of standards exist in systems biology for processes from annotation to dissemination. Although there is

Research paper thumbnail of Building and Optimizing Declarative Networked Systems

Research paper thumbnail of The Role of Quantitative Models in Building Scalable Cloud Infrastructures

2010 Seventh International Conference on the Quantitative Evaluation of Systems, 2010

Research paper thumbnail of Obfuscatory obscanturism: Making workload traces of commercially-sensitive systems safe to release

2012 IEEE Network Operations and Management Symposium, 2012

Cloud providers such as Google are interested in fostering research on the daunting technical cha... more Cloud providers such as Google are interested in fostering research on the daunting technical challenges they face in supporting planetary-scale distributed systems, but no academic organizations have similar scale systems on which to experiment. Fortunately, good research can still be done using traces of real-life production workloads, but there are risks in releasing such data, including inadvertently disclosing confidential or proprietary information, as happened with the Netflix Prize data. This paper discusses these risks, and our approach to them, which we call systematic obfuscation. It protects proprietary and personal data while leaving it possible to answer interesting research questions. We explain and motivate some of the risks and concerns and propose how they can best be mitigated, using as an example our recent publication of a monthlong trace of a production system workload on a 11k-machine cluster.

Research paper thumbnail of Using MIMO feedback control to enforce policies for interrelated metrics with application to the Apache Web server

NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327)

Policy-based management provides a means for IT systems to operate according to business needs. U... more Policy-based management provides a means for IT systems to operate according to business needs. Unfortunately, there is often an "impedance mismatch" between the policies administrators want and the controls they are given. Consider the Apache web server. Administrators want to control CPU and memory utilizations, but this must be done indirectly by manipulating tuning parameters such as MaxClients and KeepAlive. There has been much interest in using feedback control to bridge the impedance mismatch. However, these efforts have focused on a single metric that is manipulated by a single control and hence have not considered interactions between controls such as those that are common in computing systems. This paper shows how multiple-input, multiple-output (MIMO) control theory can be used to enforce policies for interrelated metrics. MIMO is used both to model the target system, Apache in our case, and to design feedback controllers. The MIMO model captures the interactions between KA and MC, and can be used to identify infeasible metric policies. In addition, MIMO control techniques can provide considerable benefit in handling trade-offs between speed of metric convergence and sensitivity to random fluctuations while enforcing the desired policies.

Research paper thumbnail of Optimizing Quality of Service Using Fuzzy Control

Lecture Notes in Computer Science, 2002

The rapid growth of eCommerce increasingly means business revenues depend on providing good quali... more The rapid growth of eCommerce increasingly means business revenues depend on providing good quality of service (QoS) for web site interactions. Traditionally, system administrators have been responsible for optimizing tuning parameters, a process that is time-consuming and skills-intensive, and therefore high cost. This paper describes an approach to automating parameter tuning using a fuzzy controller that employs rules incorporating qualitative knowledge of the effect of tuning parameters. An example of such qualitative knowledge in the Apache web server is "MaxClients has a concave upward effect on response times." Our studies using a real Apache web server suggest that such a scheme can improve performance without human intervention. Further, we show that the controller can automatically adapt to changes in workloads.

Research paper thumbnail of A first-principles approach to constructing transfer functions for admission control in computing systems

Proceedings of the 41st IEEE Conference on Decision and Control, 2002.

This paper develops a first principles approach to constructing parameterized transfer function m... more This paper develops a first principles approach to constructing parameterized transfer function models for an abstraction of admission control, the M/M/1/K queueing system. We linearize this system using the first order model y(k + 1) = ay(k) + bu(k), where y is the output (e.g., number in system) and u is buffer size. The pole a is estimated as the lag 1 autocorrelation of y at steady state, and b is estimated using dy/du. With these analytic models for a and b, we study the effects of workload (i.e., arrival and service rates) and sample times. We show that a and b move in opposite directions at large utilizations, an effect that can have significant implications on closed loop poles. Further, the DC gain for response time and number in system drops to 0 as buffer size increases, and the DC gain of number in system converges to 0.5 as workload intensity becomes large. These insights may aid in designing robust and/or adaptive controllers for computing systems. Last, our models provide insight into why the integral control of a Lotus Notes email server has an oscillatory response to a change in reference value.

Research paper thumbnail of Self-Managing Systems: A Control Theory Foundation

12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS'05)

The high cost of operating large computing installations has motivated a broad interest in reduci... more The high cost of operating large computing installations has motivated a broad interest in reducing the need for human intervention by making systems self-managing. This paper explores the extent to which control theory can provide an architectural and analytic foundation for building self-managing systems, either from new components or layering on top of existing components. Further, we propose a deployable testbed for autonomic computing (DTAC) that we believe will reduce the barriers to addressing key research problems in autonomic computing. The initial DTAC architecture is described along with several problems that it can be used to investigate.

Research paper thumbnail of Generic Online Optimization of Multiple Configuration Parameters with Application to a Database Server

Lecture Notes in Computer Science, 2003

Optimizing configuration parameters is time-consuming and skills-intensive. This paper proposes a... more Optimizing configuration parameters is time-consuming and skills-intensive. This paper proposes a generic approach to automating this task. By generic, we mean that the approach is relatively independent of the target system for which the optimization is done. Our approach uses online adjustment of configuration parameters to discover the system's performance characteristics. Doing so creates two challenges: (1) handling interdependencies between configuration parameters and (2) minimizing the deleterious effects on production workload while the optimization is underway. Our approach addresses (1) by including in the architecture a rule-based component that handles interdependencies between configuration parameters. For (2), we use a feedback mechanism for online optimization that searches the parameter space in a way that generally avoids poor performance at intermediate steps. Our studies of a DB2 Universal Database Server under an e-commerce workload indicate that our approach can be effective in practice.

Research paper thumbnail of Managing the Performance Impact of Administrative Utilities

Lecture Notes in Computer Science, 2003

Administrative utilities (e.g., filesystem and database backups, garbage collection in the Java V... more Administrative utilities (e.g., filesystem and database backups, garbage collection in the Java Virtual Machines) are an essential part of the operation of production systems. Since production work can be severely degraded by the execution of such utilities, it is desirable to have policies of the form "There should be no more than an x% degradation of production work due to utility execution." Two challenges arise in providing such policies: (1) providing an effective mechanism for throttling the resource consumption of utilities and (2) continuously translating from policy expressions of "degradation units" into the appropriate settings for the throttling mechanism. We address (1) by using self-imposed sleep, a technique that forces utilities to slow down their processing by a configurable amount. We address (2) by employing an online estimation scheme in combination with a feedback loop. This throttling system is autonomous and adaptive and allows the system to self-manage its utilities to limit their performance impact, with only high-level policy input from the administrator. We demonstrate the effectiveness of these approaches in a prototype system that incorporates these capabilities into IBM's DB2 Universal Database server.

Research paper thumbnail of Control of large scale computing systems

ACM SIGBED Review, 2006

The rapidly increasing scale of computing systems means that it is vitally important to address t... more The rapidly increasing scale of computing systems means that it is vitally important to address the scaling challenges in the control of computing systems. We introduce a framework for describing the control problems for large scale computing systems that expand along two dimensions: the scale of the target system and the scale of the policy. Using this framework, we present control architectures that span a range from centralized schemes to distributed solutions. We further identify several research challenges related to issues such as target systems latencies and policy decomposition.