Xizhou Feng - Academia.edu (original) (raw)

Papers by Xizhou Feng

Research paper thumbnail of EpiFast

Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09, 2009

Large scale realistic epidemic simulations have recently become an increasingly important applica... more Large scale realistic epidemic simulations have recently become an increasingly important application of high-performance computing. We propose a parallel algorithm, EpiFast, based on a novel interpretation of the stochastic disease propagation in a contact network. We implement it using a masterslave computation model which allows scalability on distributed memory systems.

Research paper thumbnail of I ndemics

ACM Transactions on Modeling and Computer Simulation, 2014

We describe the design and prototype implementation of INDEMICS (Interactive Epidemic Simulation)... more We describe the design and prototype implementation of INDEMICS (Interactive Epidemic Simulation) -a modeling environment utilizing high-performance computing technologies for supporting complex epidemic simulations. INDEMICS can support policy analysts and epidemiologists interested in planning and control of pandemics. INDEMICS goes beyond traditional epidemic simulations by providing a simple and powerful way to represent and analyze policy-based as well as individualbased adaptive interventions. Users can also stop the simulation at any point, assess the state of the simulated system, and add additional interventions. INDEMICS is available to end-users via a webbased interface.

Research paper thumbnail of EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2008

Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top p... more Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. We describe EpiSimdemics -a scalable parallel algorithm to simulate the spread of contagion in large, realistic social contact networks using individual-based models. EpiSimdemics is an interaction-based simulation of a certain class of stochastic reaction-diffusion processes. Straightforward simulations of such process do not scale well, limiting the use of individual-based models to very small populations. EpiSimdemics is specifically designed to scale to social networks with 100 million individuals. The scaling is obtained by exploiting the semantics of disease evolution and disease propagation in large networks. We evaluate an MPI-based parallel implementation of EpiSimdemics on a mid-sized HPC system, demonstrating that EpiSimdemics scales well. EpiSimdemics has been used in numerous sponsor defined case studies targeted at policy planning and course of action analysis, demonstrating the usefulness of EpiSimdemics in practical situations.

Research paper thumbnail of Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE

Lecture Notes in Computer Science, 2008

Heterogeneous multi-core processors invest the most significant portion of their transistor budge... more Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized "accelerator" cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications on a multi-processor based on the IBM Cell Broadband Engine.

Research paper thumbnail of Modeling and evaluating energy-performance efficiency of parallel processing on multicore based power aware systems

2009 IEEE International Symposium on Parallel & Distributed Processing, 2009

... Gustafson et al [16] to reflect the intention of solving larger problems on larger systems; m... more ... Gustafson et al [16] to reflect the intention of solving larger problems on larger systems; memory-bound speedup was proposed by Sun and Ni [24 ... One of such models is the power aware speedup model proposed by Ge and Cameron [14], which is intended to provide a general ...

Research paper thumbnail of Characterizing energy efficiency of I/O intensive parallel applications on power-aware clusters

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010

Energy efficiency and parallel I/O performance have become two critical measures in high performa... more Energy efficiency and parallel I/O performance have become two critical measures in high performance computing (HPC). However, there is little empirical data that characterize the energy-performance behaviors of parallel I/O workload. In this paper, we present a methodology to profile the performance, energy, and energy efficiency of parallel I/O access patterns and report our findings on the impacting factors of parallel I/O energy efficiency. Our study shows that choosing the right buffer size can change the energyperformance efficiency by up to 30 times. High spatial and temporal spacing can also lead to significant improvement in energy-performance efficiency (about 2X). We observe CPU frequency has a more complex impact, depending on the IO operations, spatial and temporal, and memory buffer size. The presented methodology and findings are useful for evaluating the energy efficiency of I/O intensive applications and for providing a guideline to develop energy efficient parallel I/O technology.

Research paper thumbnail of Modeling interaction between individuals, social networks and public policy to support public health epidemiology

Proceedings of the 2009 Winter Simulation Conference (WSC), 2009

Human behavior, social networks, and civil infrastructure are closely intertwined. Understanding ... more Human behavior, social networks, and civil infrastructure are closely intertwined. Understanding their co-evolution is critical for designing public policies. Human behaviors and day-to-day activities of individuals create dense social interactions that provide a perfect fabric for fast disease propagation. Conversely, people's behavior in response to public policies and their perception of the crisis can dramatically alter normally stable social interactions. Effective planning and response strategies must take these complicated interactions into account. The basic problem can be modeled as a coupled co-evolving graph dynamical system and can also be viewed as partially observable Markov decision process. As a way to overcome the computational hurdles, we describe an High Performance Computing oriented computer simulation to study this class of problems. Our method provides a novel way to study the co-evolution of human behavior and disease dynamics in very large, realistic social networks with over 100 Million nodes and 6 Billion edges. 2020 978-1-4244-5771-7/09/$26.00 ©2009 IEEE

Research paper thumbnail of SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware

2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012

Improving energy efficiency is a primary concern in high performance computing system design. Bec... more Improving energy efficiency is a primary concern in high performance computing system design. Because I/O accesses account for a large portion of the execution time for data intensive applications, energy-aware parallel I/O subsystems are critical for addressing challenges related to HPC energy efficiency. In this paper, we present an energyconscious parallel I/O middleware approach that combines runtime I/O access interception and Dynamic Voltage and Frequency Scaling capability available on modern processors to intelligently schedule the system's power-performance mode for energy savings. We implement this approach into SERA-IO, an MPI-IO based middleware to enable energy consciousness for I/O intensive applications. Experimental evaluations conducted on real systems using multiple parallel I/O benchmarks show that SERA-IO can reduce system energy by 9% to 28% without decreasing application performance. With the emerging of large-scale data intensive applications and ever larger and more complex parallel computing systems, intelligent, energy conscious software and runtime systems such as SERA-IO are critical for the success of future high-end computing.

Research paper thumbnail of PEACH

Proceedings of the 11th ACM Conference on Computing Frontiers - CF '14, 2014

Accelerator-based heterogeneous systems can provide high performance and energy efficiency, both ... more Accelerator-based heterogeneous systems can provide high performance and energy efficiency, both of which are key design goals in high performance computing. To fully realize the potential of heterogeneous architectures, software must optimally exploit the hosts' and accelerators' processing and power-saving capabilities. Yet, previous studies mainly focus on using hosts and accelerators to boost application performance. Power-saving features to improve the energy efficiency of parallel programs, such as Dynamic Voltage and Frequency Scaling (DVFS), remain largely unexplored.

Research paper thumbnail of Biology---PBPI

Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006

This paper describes the implementation and performance of PBPI, a parallel implementation of Bay... more This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies, Bayesian phylogenetic inferences can incorporate complex statistic models into the process of phylogenetic tree estimation. However, Bayesian analyses are extremely computationally expensive. PBPI uses algorithmic improvements and parallel processing to achieve significant performance improvement over comparable Bayesian phylogenetic inference programs. We evaluated the performance and accuracy of PBPI using a simulated dataset on System X, a terascale supercomputer at Virginia Tech. Our results show that PBPI identifies equivalent tree estimates 1424 times faster on 256 processors than a widely-used, best-available (albeit sequential), Bayesian phylogenetic inference program. PBPI also achieves linear speedup with the number of processors for large problem sizes. Most importantly, the PBPI framework enables Bayesian phylogenetic analysis of large datasets previously impracticable.

Research paper thumbnail of CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters

2007 International Conference on Parallel Processing (ICPP 2007), 2007

Performance and power are critical design constraints in today's high-end computing systems. Redu... more Performance and power are critical design constraints in today's high-end computing systems. Reducing power consumption without impacting system performance is a challenge for the HPC community. We present a runtime system (CPU MISER) and an integrated performance model for performance-directed, power-aware cluster computing. CPU MISER supports system-wide, applicationindependent, fine-grain, dynamic voltage and frequency scaling (DVFS) based power management for a generic power-aware cluster. Experimental results show that CPU MISER can achieve as much as 20% energy savings for the NAS parallel benchmarks. In addition to energy savings, CPU MISER is able to constrain performance loss for most applications within user-specified limits. These constraints are achieved through accurate performance modeling and prediction, coupled with advanced control techniques.

Research paper thumbnail of Performance and Energy Modeling for Cooperative Hybrid Computing

2014 9th IEEE International Conference on Networking, Architecture, and Storage, 2014

Research paper thumbnail of eTune: A Power Analysis Framework for Data-Intensive Computing

2012 41st International Conference on Parallel Processing Workshops, 2012

ABSTRACT Data-intensive workloads demand a large portion of data center resources and consume mas... more ABSTRACT Data-intensive workloads demand a large portion of data center resources and consume massive amounts of energy. Energy conservation for data-intensive computing requires enabling technology to provide detailed and systemic energy information and to identify the energy inefficiencies in the underlying system hardware and software. In this work, we address this need and present eTune, a fine-grained, scalable power analysis framework for data-intensive computing on large-scale distributed systems. eTune leverages the fine-grained component level power measurement and the hardware performance monitoring counters (PMCs) on modern computer components and statistically builds power-performance correlation models. Using the learned models, eTune implements a software-based power estimator that runs on computer nodes and reports power at multiple levels including node, core, memory, and disks with a high accuracy. The conducted case studies with MapReduce applications reveal detailed energy behaviors of typical execution phases and data movements and provide insights on energy optimization via algorithm designs and resource allocations.

Research paper thumbnail of Designing Computational Clusters for Performance and Power

Advances in Computers, 2007

Power consumption in computational clusters has reached critical levels. High-end cluster perform... more Power consumption in computational clusters has reached critical levels. High-end cluster performance improves exponentially while the power consumed and heat dissipated increase operational costs and failure rates. Yet, the demand for more powerful machines continues to grow. In this chapter, we motivate the need to reconsider the traditional performance-at-any-cost cluster design approach. We propose designs where power and performance are considered critical constraints. We describe power-aware and low power techniques to reduce the power profiles of parallel applications and mitigate the impact on performance. Michigan State University) and a series of benchmarks have been built and installed on Argus. Following our augmentation, Argus resembles a standard Linux-based cluster running existing software packages and compiling new applications.

Research paper thumbnail of Power and Energy Profiling of Scientific Applications on Distributed Systems

19th IEEE International Parallel and Distributed Processing Symposium, 2005

Power consumption is a troublesome design constraint for emergent systems such as IBM's BlueGene ... more Power consumption is a troublesome design constraint for emergent systems such as IBM's BlueGene /L. If current trends continue, future petaflop systems will require 100 megawatts of power to maintain high-performance. To address this problem the power and energy characteristics of highperformance systems must be characterized. To date, power-performance profiles for distributed systems have been limited to interactive commercial workloads. However, scientific workloads are typically non-interactive (batched) processes riddled with interprocess dependences and communication. We present a framework for direct, automatic profiling of power consumption for non-interactive, parallel scientific applications on high-performance distributed systems. Though our approach is general, we use our framework to study the power-performance efficiency of the NAS parallel benchmarks on a 32-node Beowulf cluster. We provide profiles by component (CPU, memory, disk, and NIC), by node (for each of 32 nodes), and by system scale (2, 4, 8, 16, and 32 nodes).

Research paper thumbnail of PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference

ACM/IEEE SC 2006 Conference (SC'06), 2006

This paper describes the implementation and performance of PBPI, a parallel implementation of Bay... more This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies, Bayesian phylogenetic inferences can incorporate complex statistic models into the process of phylogenetic tree estimation. However, Bayesian analyses are extremely computationally expensive. PBPI uses algorithmic improvements and parallel processing to achieve significant performance improvement over comparable Bayesian phylogenetic inference programs. We evaluated the performance and accuracy of PBPI using a simulated dataset on System X, a terascale supercomputer at Virginia Tech. Our results show that PBPI identifies equivalent tree estimates 1424 times faster on 256 processors than a widely-used, best-available (albeit sequential), Bayesian phylogenetic inference program. PBPI also achieves linear speedup with the number of processors for large problem sizes. Most importantly, the PBPI framework enables Bayesian phylogenetic analysis of large datasets previously impracticable.

Research paper thumbnail of SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware

Cluster Computing and the Grid, 2012

Improving energy efficiency is a primary concern in high performance computing system design. Bec... more Improving energy efficiency is a primary concern in high performance computing system design. Because I/O accesses account for a large portion of the execution time for data intensive applications, energy-aware parallel I/O subsystems are critical for addressing challenges related to HPC energy efficiency. In this paper, we present an energyconscious parallel I/O middleware approach that combines runtime I/O access interception and Dynamic Voltage and Frequency Scaling capability available on modern processors to intelligently schedule the system's power-performance mode for energy savings. We implement this approach into SERA-IO, an MPI-IO based middleware to enable energy consciousness for I/O intensive applications. Experimental evaluations conducted on real systems using multiple parallel I/O benchmarks show that SERA-IO can reduce system energy by 9% to 28% without decreasing application performance. With the emerging of large-scale data intensive applications and ever larger and more complex parallel computing systems, intelligent, energy conscious software and runtime systems such as SERA-IO are critical for the success of future high-end computing.

Research paper thumbnail of EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks

Supercomputing Conference, 2008

Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top p... more Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. We describe EpiSimdemics -a scalable parallel algorithm to simulate the spread of contagion in large, realistic social contact networks using individual-based models. EpiSimdemics is an interaction-based simulation of a certain class of stochastic reaction-diffusion processes. Straightforward simulations of such process do not scale well, limiting the use of individual-based models to very small populations. EpiSimdemics is specifically designed to scale to social networks with 100 million individuals. The scaling is obtained by exploiting the semantics of disease evolution and disease propagation in large networks. We evaluate an MPI-based parallel implementation of EpiSimdemics on a mid-sized HPC system, demonstrating that EpiSimdemics scales well. EpiSimdemics has been used in numerous sponsor defined case studies targeted at policy planning and course of action analysis, demonstrating the usefulness of EpiSimdemics in practical situations.

Research paper thumbnail of 3PGCIC 2011

Research paper thumbnail of Characterizing energy efficiency of I/O intensive parallel applications on power-aware clusters

Symposium on Parallel and Distributed Processing, 2010

Energy efficiency and parallel I/O performance have become two critical measures in high performa... more Energy efficiency and parallel I/O performance have become two critical measures in high performance computing (HPC). However, there is little empirical data that characterize the energy-performance behaviors of parallel I/O workload. In this paper, we present a methodology to profile the performance, energy, and energy efficiency of parallel I/O access patterns and report our findings on the impacting factors of parallel I/O energy efficiency. Our study shows that choosing the right buffer size can change the energy-performance efficiency by up to 30 times. High spatial and temporal spacing can also lead to significant improvement in energy-performance efficiency (about 2X). We observe CPU frequency has a more complex impact, depending on the IO operations, spatial and temporal, and memory buffer size. The presented methodology and findings are useful for evaluating the energy efficiency of I/O intensive applications and for providing a guideline to develop energy efficient parallel I/O technology.

Research paper thumbnail of EpiFast

Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09, 2009

Large scale realistic epidemic simulations have recently become an increasingly important applica... more Large scale realistic epidemic simulations have recently become an increasingly important application of high-performance computing. We propose a parallel algorithm, EpiFast, based on a novel interpretation of the stochastic disease propagation in a contact network. We implement it using a masterslave computation model which allows scalability on distributed memory systems.

Research paper thumbnail of I ndemics

ACM Transactions on Modeling and Computer Simulation, 2014

We describe the design and prototype implementation of INDEMICS (Interactive Epidemic Simulation)... more We describe the design and prototype implementation of INDEMICS (Interactive Epidemic Simulation) -a modeling environment utilizing high-performance computing technologies for supporting complex epidemic simulations. INDEMICS can support policy analysts and epidemiologists interested in planning and control of pandemics. INDEMICS goes beyond traditional epidemic simulations by providing a simple and powerful way to represent and analyze policy-based as well as individualbased adaptive interventions. Users can also stop the simulation at any point, assess the state of the simulated system, and add additional interventions. INDEMICS is available to end-users via a webbased interface.

Research paper thumbnail of EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2008

Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top p... more Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. We describe EpiSimdemics -a scalable parallel algorithm to simulate the spread of contagion in large, realistic social contact networks using individual-based models. EpiSimdemics is an interaction-based simulation of a certain class of stochastic reaction-diffusion processes. Straightforward simulations of such process do not scale well, limiting the use of individual-based models to very small populations. EpiSimdemics is specifically designed to scale to social networks with 100 million individuals. The scaling is obtained by exploiting the semantics of disease evolution and disease propagation in large networks. We evaluate an MPI-based parallel implementation of EpiSimdemics on a mid-sized HPC system, demonstrating that EpiSimdemics scales well. EpiSimdemics has been used in numerous sponsor defined case studies targeted at policy planning and course of action analysis, demonstrating the usefulness of EpiSimdemics in practical situations.

Research paper thumbnail of Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE

Lecture Notes in Computer Science, 2008

Heterogeneous multi-core processors invest the most significant portion of their transistor budge... more Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized "accelerator" cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications on a multi-processor based on the IBM Cell Broadband Engine.

Research paper thumbnail of Modeling and evaluating energy-performance efficiency of parallel processing on multicore based power aware systems

2009 IEEE International Symposium on Parallel & Distributed Processing, 2009

... Gustafson et al [16] to reflect the intention of solving larger problems on larger systems; m... more ... Gustafson et al [16] to reflect the intention of solving larger problems on larger systems; memory-bound speedup was proposed by Sun and Ni [24 ... One of such models is the power aware speedup model proposed by Ge and Cameron [14], which is intended to provide a general ...

Research paper thumbnail of Characterizing energy efficiency of I/O intensive parallel applications on power-aware clusters

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010

Energy efficiency and parallel I/O performance have become two critical measures in high performa... more Energy efficiency and parallel I/O performance have become two critical measures in high performance computing (HPC). However, there is little empirical data that characterize the energy-performance behaviors of parallel I/O workload. In this paper, we present a methodology to profile the performance, energy, and energy efficiency of parallel I/O access patterns and report our findings on the impacting factors of parallel I/O energy efficiency. Our study shows that choosing the right buffer size can change the energyperformance efficiency by up to 30 times. High spatial and temporal spacing can also lead to significant improvement in energy-performance efficiency (about 2X). We observe CPU frequency has a more complex impact, depending on the IO operations, spatial and temporal, and memory buffer size. The presented methodology and findings are useful for evaluating the energy efficiency of I/O intensive applications and for providing a guideline to develop energy efficient parallel I/O technology.

Research paper thumbnail of Modeling interaction between individuals, social networks and public policy to support public health epidemiology

Proceedings of the 2009 Winter Simulation Conference (WSC), 2009

Human behavior, social networks, and civil infrastructure are closely intertwined. Understanding ... more Human behavior, social networks, and civil infrastructure are closely intertwined. Understanding their co-evolution is critical for designing public policies. Human behaviors and day-to-day activities of individuals create dense social interactions that provide a perfect fabric for fast disease propagation. Conversely, people's behavior in response to public policies and their perception of the crisis can dramatically alter normally stable social interactions. Effective planning and response strategies must take these complicated interactions into account. The basic problem can be modeled as a coupled co-evolving graph dynamical system and can also be viewed as partially observable Markov decision process. As a way to overcome the computational hurdles, we describe an High Performance Computing oriented computer simulation to study this class of problems. Our method provides a novel way to study the co-evolution of human behavior and disease dynamics in very large, realistic social networks with over 100 Million nodes and 6 Billion edges. 2020 978-1-4244-5771-7/09/$26.00 ©2009 IEEE

Research paper thumbnail of SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware

2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012

Improving energy efficiency is a primary concern in high performance computing system design. Bec... more Improving energy efficiency is a primary concern in high performance computing system design. Because I/O accesses account for a large portion of the execution time for data intensive applications, energy-aware parallel I/O subsystems are critical for addressing challenges related to HPC energy efficiency. In this paper, we present an energyconscious parallel I/O middleware approach that combines runtime I/O access interception and Dynamic Voltage and Frequency Scaling capability available on modern processors to intelligently schedule the system's power-performance mode for energy savings. We implement this approach into SERA-IO, an MPI-IO based middleware to enable energy consciousness for I/O intensive applications. Experimental evaluations conducted on real systems using multiple parallel I/O benchmarks show that SERA-IO can reduce system energy by 9% to 28% without decreasing application performance. With the emerging of large-scale data intensive applications and ever larger and more complex parallel computing systems, intelligent, energy conscious software and runtime systems such as SERA-IO are critical for the success of future high-end computing.

Research paper thumbnail of PEACH

Proceedings of the 11th ACM Conference on Computing Frontiers - CF '14, 2014

Accelerator-based heterogeneous systems can provide high performance and energy efficiency, both ... more Accelerator-based heterogeneous systems can provide high performance and energy efficiency, both of which are key design goals in high performance computing. To fully realize the potential of heterogeneous architectures, software must optimally exploit the hosts' and accelerators' processing and power-saving capabilities. Yet, previous studies mainly focus on using hosts and accelerators to boost application performance. Power-saving features to improve the energy efficiency of parallel programs, such as Dynamic Voltage and Frequency Scaling (DVFS), remain largely unexplored.

Research paper thumbnail of Biology---PBPI

Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006

This paper describes the implementation and performance of PBPI, a parallel implementation of Bay... more This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies, Bayesian phylogenetic inferences can incorporate complex statistic models into the process of phylogenetic tree estimation. However, Bayesian analyses are extremely computationally expensive. PBPI uses algorithmic improvements and parallel processing to achieve significant performance improvement over comparable Bayesian phylogenetic inference programs. We evaluated the performance and accuracy of PBPI using a simulated dataset on System X, a terascale supercomputer at Virginia Tech. Our results show that PBPI identifies equivalent tree estimates 1424 times faster on 256 processors than a widely-used, best-available (albeit sequential), Bayesian phylogenetic inference program. PBPI also achieves linear speedup with the number of processors for large problem sizes. Most importantly, the PBPI framework enables Bayesian phylogenetic analysis of large datasets previously impracticable.

Research paper thumbnail of CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters

2007 International Conference on Parallel Processing (ICPP 2007), 2007

Performance and power are critical design constraints in today's high-end computing systems. Redu... more Performance and power are critical design constraints in today's high-end computing systems. Reducing power consumption without impacting system performance is a challenge for the HPC community. We present a runtime system (CPU MISER) and an integrated performance model for performance-directed, power-aware cluster computing. CPU MISER supports system-wide, applicationindependent, fine-grain, dynamic voltage and frequency scaling (DVFS) based power management for a generic power-aware cluster. Experimental results show that CPU MISER can achieve as much as 20% energy savings for the NAS parallel benchmarks. In addition to energy savings, CPU MISER is able to constrain performance loss for most applications within user-specified limits. These constraints are achieved through accurate performance modeling and prediction, coupled with advanced control techniques.

Research paper thumbnail of Performance and Energy Modeling for Cooperative Hybrid Computing

2014 9th IEEE International Conference on Networking, Architecture, and Storage, 2014

Research paper thumbnail of eTune: A Power Analysis Framework for Data-Intensive Computing

2012 41st International Conference on Parallel Processing Workshops, 2012

ABSTRACT Data-intensive workloads demand a large portion of data center resources and consume mas... more ABSTRACT Data-intensive workloads demand a large portion of data center resources and consume massive amounts of energy. Energy conservation for data-intensive computing requires enabling technology to provide detailed and systemic energy information and to identify the energy inefficiencies in the underlying system hardware and software. In this work, we address this need and present eTune, a fine-grained, scalable power analysis framework for data-intensive computing on large-scale distributed systems. eTune leverages the fine-grained component level power measurement and the hardware performance monitoring counters (PMCs) on modern computer components and statistically builds power-performance correlation models. Using the learned models, eTune implements a software-based power estimator that runs on computer nodes and reports power at multiple levels including node, core, memory, and disks with a high accuracy. The conducted case studies with MapReduce applications reveal detailed energy behaviors of typical execution phases and data movements and provide insights on energy optimization via algorithm designs and resource allocations.

Research paper thumbnail of Designing Computational Clusters for Performance and Power

Advances in Computers, 2007

Power consumption in computational clusters has reached critical levels. High-end cluster perform... more Power consumption in computational clusters has reached critical levels. High-end cluster performance improves exponentially while the power consumed and heat dissipated increase operational costs and failure rates. Yet, the demand for more powerful machines continues to grow. In this chapter, we motivate the need to reconsider the traditional performance-at-any-cost cluster design approach. We propose designs where power and performance are considered critical constraints. We describe power-aware and low power techniques to reduce the power profiles of parallel applications and mitigate the impact on performance. Michigan State University) and a series of benchmarks have been built and installed on Argus. Following our augmentation, Argus resembles a standard Linux-based cluster running existing software packages and compiling new applications.

Research paper thumbnail of Power and Energy Profiling of Scientific Applications on Distributed Systems

19th IEEE International Parallel and Distributed Processing Symposium, 2005

Power consumption is a troublesome design constraint for emergent systems such as IBM's BlueGene ... more Power consumption is a troublesome design constraint for emergent systems such as IBM's BlueGene /L. If current trends continue, future petaflop systems will require 100 megawatts of power to maintain high-performance. To address this problem the power and energy characteristics of highperformance systems must be characterized. To date, power-performance profiles for distributed systems have been limited to interactive commercial workloads. However, scientific workloads are typically non-interactive (batched) processes riddled with interprocess dependences and communication. We present a framework for direct, automatic profiling of power consumption for non-interactive, parallel scientific applications on high-performance distributed systems. Though our approach is general, we use our framework to study the power-performance efficiency of the NAS parallel benchmarks on a 32-node Beowulf cluster. We provide profiles by component (CPU, memory, disk, and NIC), by node (for each of 32 nodes), and by system scale (2, 4, 8, 16, and 32 nodes).

Research paper thumbnail of PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference

ACM/IEEE SC 2006 Conference (SC'06), 2006

This paper describes the implementation and performance of PBPI, a parallel implementation of Bay... more This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies, Bayesian phylogenetic inferences can incorporate complex statistic models into the process of phylogenetic tree estimation. However, Bayesian analyses are extremely computationally expensive. PBPI uses algorithmic improvements and parallel processing to achieve significant performance improvement over comparable Bayesian phylogenetic inference programs. We evaluated the performance and accuracy of PBPI using a simulated dataset on System X, a terascale supercomputer at Virginia Tech. Our results show that PBPI identifies equivalent tree estimates 1424 times faster on 256 processors than a widely-used, best-available (albeit sequential), Bayesian phylogenetic inference program. PBPI also achieves linear speedup with the number of processors for large problem sizes. Most importantly, the PBPI framework enables Bayesian phylogenetic analysis of large datasets previously impracticable.

Research paper thumbnail of SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware

Cluster Computing and the Grid, 2012

Improving energy efficiency is a primary concern in high performance computing system design. Bec... more Improving energy efficiency is a primary concern in high performance computing system design. Because I/O accesses account for a large portion of the execution time for data intensive applications, energy-aware parallel I/O subsystems are critical for addressing challenges related to HPC energy efficiency. In this paper, we present an energyconscious parallel I/O middleware approach that combines runtime I/O access interception and Dynamic Voltage and Frequency Scaling capability available on modern processors to intelligently schedule the system's power-performance mode for energy savings. We implement this approach into SERA-IO, an MPI-IO based middleware to enable energy consciousness for I/O intensive applications. Experimental evaluations conducted on real systems using multiple parallel I/O benchmarks show that SERA-IO can reduce system energy by 9% to 28% without decreasing application performance. With the emerging of large-scale data intensive applications and ever larger and more complex parallel computing systems, intelligent, energy conscious software and runtime systems such as SERA-IO are critical for the success of future high-end computing.

Research paper thumbnail of EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks

Supercomputing Conference, 2008

Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top p... more Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. We describe EpiSimdemics -a scalable parallel algorithm to simulate the spread of contagion in large, realistic social contact networks using individual-based models. EpiSimdemics is an interaction-based simulation of a certain class of stochastic reaction-diffusion processes. Straightforward simulations of such process do not scale well, limiting the use of individual-based models to very small populations. EpiSimdemics is specifically designed to scale to social networks with 100 million individuals. The scaling is obtained by exploiting the semantics of disease evolution and disease propagation in large networks. We evaluate an MPI-based parallel implementation of EpiSimdemics on a mid-sized HPC system, demonstrating that EpiSimdemics scales well. EpiSimdemics has been used in numerous sponsor defined case studies targeted at policy planning and course of action analysis, demonstrating the usefulness of EpiSimdemics in practical situations.

Research paper thumbnail of 3PGCIC 2011

Research paper thumbnail of Characterizing energy efficiency of I/O intensive parallel applications on power-aware clusters

Symposium on Parallel and Distributed Processing, 2010

Energy efficiency and parallel I/O performance have become two critical measures in high performa... more Energy efficiency and parallel I/O performance have become two critical measures in high performance computing (HPC). However, there is little empirical data that characterize the energy-performance behaviors of parallel I/O workload. In this paper, we present a methodology to profile the performance, energy, and energy efficiency of parallel I/O access patterns and report our findings on the impacting factors of parallel I/O energy efficiency. Our study shows that choosing the right buffer size can change the energy-performance efficiency by up to 30 times. High spatial and temporal spacing can also lead to significant improvement in energy-performance efficiency (about 2X). We observe CPU frequency has a more complex impact, depending on the IO operations, spatial and temporal, and memory buffer size. The presented methodology and findings are useful for evaluating the energy efficiency of I/O intensive applications and for providing a guideline to develop energy efficient parallel I/O technology.