Terry Ligocki - Academia.edu (original) (raw)
Papers by Terry Ligocki
We present new prototype tools for optimizing building solar energy impacts in urban regions, to ... more We present new prototype tools for optimizing building solar energy impacts in urban regions, to enable better real-time control and policy decisions for energy supply and demand response. The concept is demonstrated with a prototype that estimates the amount of direct sunlight available to building surfaces in complex urban landscapes, taking into consideration local weather predictions (via cloud cover simulation). We also calculate partial shadows from visual obstructions, due to their effect on the availability of solar energy and building energy usage. The prototype has the potential to make better day-ahead predictions that can help balance energy supply and demand during peak load hours. This can lead to better strategies for control of heating, air conditioning and alternatives (such as local energy storage in batteries or co-generation) to offset peak energy demand. However, in addition it can be used as a statistical optimization tool for informing local policy decisions related to solar energy incentives and demand response programs. We apply the approach to a prototype calculation on models of a hypothetical city and a section of downtown San Francisco. We briefly discuss optimization opportunities in response to the variability and uncertainty in solar energy for individual buildings in an urban landscape.
The increasing on-chip parallelism has some substantial implications for HPC applications. Curren... more The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware's architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.
Adaptive Mesh Refinement (AMR) represents a significant advance for scientific simulation codes, ... more Adaptive Mesh Refinement (AMR) represents a significant advance for scientific simulation codes, greatly reducing memory and compute requirements by dynamically varying simulation resolution over space and time. As simulation codes transition to AMR, existing analysis algorithms must also make this transition. One such algorithm, connected component detection, is of vital importance in many simulation and analysis contexts, with some simulation codes even relying on parallel, in situ connected component detection for correctness. Yet, current detection algorithms designed for uniform meshes are not applicable to hierarchical, non-uniform AMR, and to the best of our knowledge, AMR connected component detection has not been explored in the literature. Therefore, in this paper, we formally define the general problem of connected component detection for AMR, and present a general solution. Beyond solving the general detection problem, achieving viable in situ detection performance is even more challenging. The core issue is the conflict between the communication-intensive nature of connected component detection (in general, and especially for AMR data) and the requirement that in situ processes incur minimal performance impact on the co-located simulation. We address this challenge by presenting the first connected component detection methodology for structured AMR that is applicable in a parallel, in situ context. Our key strategy is the incorporation of an multi-phase AMR-aware communication pattern that synchronizes connectivity information across the AMR hierarchy. In addition, we distill our methodology to a generic framework within the Chombo AMR infrastructure, making connected component detection services available for many existing applications. We demonstrate our method's efficacy by showing its ability to detect ice calving events in real time within the real-world BISICLES ice sheet modeling code. Results show up to a 6.8x speedup of our algorithm over the existing specialized BISICLES algorithm. We also show scalability results for our method up to 4,096 cores using a parallel Chombo-based benchmark.
Journal of physics, 2005
View the article online for updates and enhancements. You may also like Experimental characteriza... more View the article online for updates and enhancements. You may also like Experimental characterization and modelling of the resistive wall mode response in a reversed field pinch E A Saad and P R Brunsell-Role of Hall effect on the resistive kink mode in tokamaks W Zhang, Z W Ma, H W Zhang et al.
Proceedings of SPIE, May 30, 2003
Lecture Notes in Computer Science, 2015
We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated a... more We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.
Eurographics, 2012
Abstract: Adaptive mesh refinement (AMR) is a numerical simulation techniqueused in computational... more Abstract: Adaptive mesh refinement (AMR) is a numerical simulation techniqueused in computational fluid dynamics (CFD). It permits the efficient simulation ofphenomena characterized by substantially varying scales in complexity of localbehavior of certain variables. By using a set of nested grids at different resolutions,AMR combines the simplicity of structured rectilinear grids with the possibilityto adapt to local changes in complexity and spatial resolution. Hierarchicalrepresentations of ...
2014 AGU Fall Meeting, Dec 19, 2014
Journal of physics, Jul 1, 2007
In this paper, we discuss some of the issues in obtaining high performance for block-structured a... more In this paper, we discuss some of the issues in obtaining high performance for block-structured adaptive mesh refinement software for partial differential equations. We show examples in which AMR scales to thousands of processors. We also discuss a number of metrics for performance and scalability that can provide a basis for understanding the advantages and disadvantages of this approach.
Journal of physics, Jul 1, 2008
We present an algorithm for calculating moments in arbitrary dimension to an arbitrary order of a... more We present an algorithm for calculating moments in arbitrary dimension to an arbitrary order of accuracy over regions defined by the intersection of a interface with a control volume. Such moments arise in finite volume discretizations of PDE over complex domains. The algorithm, which is adaptive and embarassingly parallel, relies on implicit function representations of surfaces, the divergence theorem, Taylor expansions, and constrained least squares. These ingredients combine in a recursion that terminates in 1D root finding and integration of monomomials along line segments. We illustrate the algorithm using interfaces derived from image data, digital elevation maps, analytic expressions, as well as the operations of constructive solid geometry applied to of all of the above.
Adaptive mesh refinement (AMR) is a numerical simulation technique used in computational fluid dy... more Adaptive mesh refinement (AMR) is a numerical simulation technique used in computational fluid dynamics (CFD). This technique permits efficient simulation of phenomena characterized by substantially varying scales in complexity. By using a set of nested grids of different resolutions, AMR combines the simplicity of structured rectilinear grids with the possibility to adapt to local changes in complexity within the domain of a numerical simulation that otherwise requires the use of unstructured grids. Without proper interpolation at the boundaries of the nested grids of different levels of a hierarchy, discontinuities can arise. These discontinuities can lead, for example, to cracks in an extracted isosurface. Treating locations of data values given at the cell centers of AMR grids as vertices of a dual grid allows us to use the original data values of the cell-centered AMR data in a marching-cubes (MC) isosurface extraction scheme that expects vertex-centered data. The use of dual grids also induces gaps between grids of different hierarchy levels. We use an index-based tessellation approach to fill these gaps with "stitch cells." By extending the standard MC approach to a finite set of stitch cells, we can define an isosurface extraction scheme that avoids cracks at level boundaries.
Springer eBooks, 2003
We describe a system supporting the interactive exploration of threedimensional scientific data s... more We describe a system supporting the interactive exploration of threedimensional scientific data sets in a virtual reality (VR) environment. This system aids a scientist in understanding a data set by interactively placing and manipulating visualization primitives, e. g., isosurfaces or streamlines, and thereby finding features in the data and understanding its overall structure. We discuss how the requirement of interactivity influences the architecture of the visualization system, and how to adapt standard visualization techniques to work under real-time interaction constraints. Though we have implemented our visualization system to work with multiple types of data sets structures-cartesian, tetrahedral, curvilinear-hexahedral and adaptive mesh refinement (AMR)-we will focus on AMR grids and show how their inherent multiresolution structure is useful for interactive visualization.
Journal of Computational Physics, 2006
We present an algorithm for solving Poisson's equation and the heat equation on irregular domains... more We present an algorithm for solving Poisson's equation and the heat equation on irregular domains in three dimensions. Our work uses the Cartesian grid embedded boundary algorithm for 2D problems of
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), May 9, 2007
Adaptive Mesh Refinement (AMR) is a highly effective method for simulations that span a large ran... more Adaptive Mesh Refinement (AMR) is a highly effective method for simulations that span a large range of spatiotemporal scales, such as astrophysical simulations that must accomodate ranges from interstellar to sub-planetary. Most mainstream visualization tools still lack support for AMR as a first class data type and AMR code teams use custom built applications for AMR visualization. The Department of Energy's (DOE's) Science Discovery through Advanced Computing (SciDAC) Visualization and Analytics Center for Enabling Technologies (VACET) is currently working on extending VisIt, which is an open source visualization tool that accommodates AMR as a firstclass data type. These efforts will bridge the gap between generalpurpose visualization applications and highly specialized AMR visual analysis applications. Here, we give an overview of the state of the art in AMR visualization research and tools and describe how VisIt currently handles AMR data.
In this paper, we discuss some of the issues in obtaining high performance for block-structured a... more In this paper, we discuss some of the issues in obtaining high performance for block-structured adaptive mesh refinement software for partial differential equations. We show examples in which AMR scales to thousands of processors. We also discuss a number of metrics for performance and scalability that can provide a basis for understanding the advantages and disadvantages of this approach.
APS, Aug 1, 2002
We have applied the Adaptive Mesh Refinement technique (AMR) to provide an accurate and efficient... more We have applied the Adaptive Mesh Refinement technique (AMR) to provide an accurate and efficient method for calculating magnetic reconnection, including the outer and inner regions. The 2D resistive MHD equations are solved in a rectangular domain. Two plasma columns are allowed to merge and the reconnection rate is calculated [1]. The finite difference equations are obtained using a second
In this paper, we discuss some of the issues in obtaining high performance for block-structured a... more In this paper, we discuss some of the issues in obtaining high performance for block-structured adaptive mesh refinement software for Poisson's equation. We show examples in which AMR scales to thousands of processors. We also discuss a number of metrics for performance and scalability that can provide a basis for understanding the advantages and disadvantages of this approach.
We present new prototype tools for optimizing building solar energy impacts in urban regions, to ... more We present new prototype tools for optimizing building solar energy impacts in urban regions, to enable better real-time control and policy decisions for energy supply and demand response. The concept is demonstrated with a prototype that estimates the amount of direct sunlight available to building surfaces in complex urban landscapes, taking into consideration local weather predictions (via cloud cover simulation). We also calculate partial shadows from visual obstructions, due to their effect on the availability of solar energy and building energy usage. The prototype has the potential to make better day-ahead predictions that can help balance energy supply and demand during peak load hours. This can lead to better strategies for control of heating, air conditioning and alternatives (such as local energy storage in batteries or co-generation) to offset peak energy demand. However, in addition it can be used as a statistical optimization tool for informing local policy decisions related to solar energy incentives and demand response programs. We apply the approach to a prototype calculation on models of a hypothetical city and a section of downtown San Francisco. We briefly discuss optimization opportunities in response to the variability and uncertainty in solar energy for individual buildings in an urban landscape.
The increasing on-chip parallelism has some substantial implications for HPC applications. Curren... more The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware's architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.
Adaptive Mesh Refinement (AMR) represents a significant advance for scientific simulation codes, ... more Adaptive Mesh Refinement (AMR) represents a significant advance for scientific simulation codes, greatly reducing memory and compute requirements by dynamically varying simulation resolution over space and time. As simulation codes transition to AMR, existing analysis algorithms must also make this transition. One such algorithm, connected component detection, is of vital importance in many simulation and analysis contexts, with some simulation codes even relying on parallel, in situ connected component detection for correctness. Yet, current detection algorithms designed for uniform meshes are not applicable to hierarchical, non-uniform AMR, and to the best of our knowledge, AMR connected component detection has not been explored in the literature. Therefore, in this paper, we formally define the general problem of connected component detection for AMR, and present a general solution. Beyond solving the general detection problem, achieving viable in situ detection performance is even more challenging. The core issue is the conflict between the communication-intensive nature of connected component detection (in general, and especially for AMR data) and the requirement that in situ processes incur minimal performance impact on the co-located simulation. We address this challenge by presenting the first connected component detection methodology for structured AMR that is applicable in a parallel, in situ context. Our key strategy is the incorporation of an multi-phase AMR-aware communication pattern that synchronizes connectivity information across the AMR hierarchy. In addition, we distill our methodology to a generic framework within the Chombo AMR infrastructure, making connected component detection services available for many existing applications. We demonstrate our method's efficacy by showing its ability to detect ice calving events in real time within the real-world BISICLES ice sheet modeling code. Results show up to a 6.8x speedup of our algorithm over the existing specialized BISICLES algorithm. We also show scalability results for our method up to 4,096 cores using a parallel Chombo-based benchmark.
Journal of physics, 2005
View the article online for updates and enhancements. You may also like Experimental characteriza... more View the article online for updates and enhancements. You may also like Experimental characterization and modelling of the resistive wall mode response in a reversed field pinch E A Saad and P R Brunsell-Role of Hall effect on the resistive kink mode in tokamaks W Zhang, Z W Ma, H W Zhang et al.
Proceedings of SPIE, May 30, 2003
Lecture Notes in Computer Science, 2015
We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated a... more We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.
Eurographics, 2012
Abstract: Adaptive mesh refinement (AMR) is a numerical simulation techniqueused in computational... more Abstract: Adaptive mesh refinement (AMR) is a numerical simulation techniqueused in computational fluid dynamics (CFD). It permits the efficient simulation ofphenomena characterized by substantially varying scales in complexity of localbehavior of certain variables. By using a set of nested grids at different resolutions,AMR combines the simplicity of structured rectilinear grids with the possibilityto adapt to local changes in complexity and spatial resolution. Hierarchicalrepresentations of ...
2014 AGU Fall Meeting, Dec 19, 2014
Journal of physics, Jul 1, 2007
In this paper, we discuss some of the issues in obtaining high performance for block-structured a... more In this paper, we discuss some of the issues in obtaining high performance for block-structured adaptive mesh refinement software for partial differential equations. We show examples in which AMR scales to thousands of processors. We also discuss a number of metrics for performance and scalability that can provide a basis for understanding the advantages and disadvantages of this approach.
Journal of physics, Jul 1, 2008
We present an algorithm for calculating moments in arbitrary dimension to an arbitrary order of a... more We present an algorithm for calculating moments in arbitrary dimension to an arbitrary order of accuracy over regions defined by the intersection of a interface with a control volume. Such moments arise in finite volume discretizations of PDE over complex domains. The algorithm, which is adaptive and embarassingly parallel, relies on implicit function representations of surfaces, the divergence theorem, Taylor expansions, and constrained least squares. These ingredients combine in a recursion that terminates in 1D root finding and integration of monomomials along line segments. We illustrate the algorithm using interfaces derived from image data, digital elevation maps, analytic expressions, as well as the operations of constructive solid geometry applied to of all of the above.
Adaptive mesh refinement (AMR) is a numerical simulation technique used in computational fluid dy... more Adaptive mesh refinement (AMR) is a numerical simulation technique used in computational fluid dynamics (CFD). This technique permits efficient simulation of phenomena characterized by substantially varying scales in complexity. By using a set of nested grids of different resolutions, AMR combines the simplicity of structured rectilinear grids with the possibility to adapt to local changes in complexity within the domain of a numerical simulation that otherwise requires the use of unstructured grids. Without proper interpolation at the boundaries of the nested grids of different levels of a hierarchy, discontinuities can arise. These discontinuities can lead, for example, to cracks in an extracted isosurface. Treating locations of data values given at the cell centers of AMR grids as vertices of a dual grid allows us to use the original data values of the cell-centered AMR data in a marching-cubes (MC) isosurface extraction scheme that expects vertex-centered data. The use of dual grids also induces gaps between grids of different hierarchy levels. We use an index-based tessellation approach to fill these gaps with "stitch cells." By extending the standard MC approach to a finite set of stitch cells, we can define an isosurface extraction scheme that avoids cracks at level boundaries.
Springer eBooks, 2003
We describe a system supporting the interactive exploration of threedimensional scientific data s... more We describe a system supporting the interactive exploration of threedimensional scientific data sets in a virtual reality (VR) environment. This system aids a scientist in understanding a data set by interactively placing and manipulating visualization primitives, e. g., isosurfaces or streamlines, and thereby finding features in the data and understanding its overall structure. We discuss how the requirement of interactivity influences the architecture of the visualization system, and how to adapt standard visualization techniques to work under real-time interaction constraints. Though we have implemented our visualization system to work with multiple types of data sets structures-cartesian, tetrahedral, curvilinear-hexahedral and adaptive mesh refinement (AMR)-we will focus on AMR grids and show how their inherent multiresolution structure is useful for interactive visualization.
Journal of Computational Physics, 2006
We present an algorithm for solving Poisson's equation and the heat equation on irregular domains... more We present an algorithm for solving Poisson's equation and the heat equation on irregular domains in three dimensions. Our work uses the Cartesian grid embedded boundary algorithm for 2D problems of
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), May 9, 2007
Adaptive Mesh Refinement (AMR) is a highly effective method for simulations that span a large ran... more Adaptive Mesh Refinement (AMR) is a highly effective method for simulations that span a large range of spatiotemporal scales, such as astrophysical simulations that must accomodate ranges from interstellar to sub-planetary. Most mainstream visualization tools still lack support for AMR as a first class data type and AMR code teams use custom built applications for AMR visualization. The Department of Energy's (DOE's) Science Discovery through Advanced Computing (SciDAC) Visualization and Analytics Center for Enabling Technologies (VACET) is currently working on extending VisIt, which is an open source visualization tool that accommodates AMR as a firstclass data type. These efforts will bridge the gap between generalpurpose visualization applications and highly specialized AMR visual analysis applications. Here, we give an overview of the state of the art in AMR visualization research and tools and describe how VisIt currently handles AMR data.
In this paper, we discuss some of the issues in obtaining high performance for block-structured a... more In this paper, we discuss some of the issues in obtaining high performance for block-structured adaptive mesh refinement software for partial differential equations. We show examples in which AMR scales to thousands of processors. We also discuss a number of metrics for performance and scalability that can provide a basis for understanding the advantages and disadvantages of this approach.
APS, Aug 1, 2002
We have applied the Adaptive Mesh Refinement technique (AMR) to provide an accurate and efficient... more We have applied the Adaptive Mesh Refinement technique (AMR) to provide an accurate and efficient method for calculating magnetic reconnection, including the outer and inner regions. The 2D resistive MHD equations are solved in a rectangular domain. Two plasma columns are allowed to merge and the reconnection rate is calculated [1]. The finite difference equations are obtained using a second
In this paper, we discuss some of the issues in obtaining high performance for block-structured a... more In this paper, we discuss some of the issues in obtaining high performance for block-structured adaptive mesh refinement software for Poisson's equation. We show examples in which AMR scales to thousands of processors. We also discuss a number of metrics for performance and scalability that can provide a basis for understanding the advantages and disadvantages of this approach.