Dan Stanzione - Academia.edu (original) (raw)

Papers by Dan Stanzione

Research paper thumbnail of CyVerse: Cyberinfrastructure for open science

PLOS computational biology/PLoS computational biology, Feb 7, 2024

Research paper thumbnail of Can the United States Maintain Its Leadership in High-Performance Computing? - A report from the ASCAC Subcommittee on American Competitiveness and Innovation to the ASCR Office

Research paper thumbnail of TACC in 2020: COVID-19, and the Next Generation of Cyberinfrastructure

Research paper thumbnail of The Path to Exascale

Texas Advanced Computing Center (TACC

Research paper thumbnail of Stampede 2

The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than e... more The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than eight million successful computational simulations and data analysis jobs to more than ten thousand users. In addition, Stampede 1 introduced new technology that began to move users towards many core processors. As Stampede 1 reaches the end of its production life, it is being replaced in phases by a new supercomputer, Stampede 2, that will not only take up much of the original system's workload, but continue the bridge to technologies on the path to exascale computing. This paper provides a brief summary of the experiences of Stampede 1, and details the design and architecture of Stampede 2. Early results are presented from a subset of Intel Knights Landing nodes that are bridging between the two systems.

Research paper thumbnail of The iPlant Collaborative- bioinformatics infrastructure, tools, and resources

Plant and Animal Genome XX Conference (January 14-18, 2012), Jan 16, 2012

Research paper thumbnail of Offline parallel debugging

Debugging is difficult; debugging parallel programs at large scale is particularly so. Interactiv... more Debugging is difficult; debugging parallel programs at large scale is particularly so. Interactive debugging tools continue to improve in ways that mitigate the difficulties, and the best such systems will continue to be mission critical. Such tools have their limitations, however. They are often unable to operate across many thousands of cores. Even when they do function correctly, mining and

Research paper thumbnail of Jetstream: A Novel Cloud System for Science

CRC Press eBooks, May 8, 2019

Research paper thumbnail of InfiniBand routing---InfiniBand routing and switching

InfiniBand has emerged as a new high bandwidth, low latency standard for high performance computi... more InfiniBand has emerged as a new high bandwidth, low latency standard for high performance computing, but as a technology, is still focused on Layer 2 switching. Standards have not yet been defined for InfiniBand Layer 3 Routing, required for additional scalability, distance reach, security, and fault tolerance and isolation.The meeting will consist of:Product Leads from InfiniBand vendors discussing the unique

Research paper thumbnail of Energy-saving effects by using 380vdc power supply system interconnected with a solar power generation system in texas

Power consumption of ICT facilities and data centers has grown, and this has led to a need to imp... more Power consumption of ICT facilities and data centers has grown, and this has led to a need to improve energy efficiency of these facilities. DC power distribution systems employing 380VDC as the supply voltage is one promising approach to address this problem for countries around the world developing and deploying commercial services. We demonstrated a 380VDC power distribution system interconnected with a solar power generation system in Texas, USA. The purpose of this demonstration was to show that a 380VDC power supply system saves more energy than an AC power supply system, and to show how much carbon dioxide emissions can be reduced by integrating a solar power generation system. This demonstration resulted in an approximate 17% energy reduction compared with an AC power supply system having the same level of reliability. Also, an evaluation using Data center Performance Per Energy (DPPE) as a performance index of the efficiency of data centers was carried out. The results showed that Power Usage Effectiveness (PUE), one of the sub-metrics of DPPE, improved with the 380VDC power supply system compared with the AC power supply system.

Research paper thumbnail of <title>Development environment for configurable computing</title>

Proceedings of SPIE, Oct 8, 1998

Research paper thumbnail of Building Wrangler: A transformational data intensive resource for the open science community

Research paper thumbnail of Optimizing the PCIT algorithm on stampede's Xeon and Xeon Phi processors for faster discovery of biological networks

ABSTRACT The PCIT method is an important technique for detecting interactions between networks. T... more ABSTRACT The PCIT method is an important technique for detecting interactions between networks. The PCIT algorithm has been used in the biological context to infer complex regulatory mechanisms and interactions in genetic networks, in genome wide association studies, and in other similar problems. In this work, the PCIT algorithm is re-implemented with exemplary parallel, vector, I/O, memory and instruction optimizations for today&#39;s multi- and many-core architectures. The evolution and performance of the new code targets the processor architectures of the Stampede supercomputer, but will also benefit other architectures. The Stampede system consists of an Intel Xeon E5 processor base system with an innovative component comprised of Intel Xeon Phi Coprocessors. Optimized results and an analysis are presented for both the Xeon and the Xeon Phi.

Research paper thumbnail of Efficient virtual machine caching in dynamic virtual clusters

At many university, government, and corporate facilities, it is increasingly common for multiple ... more At many university, government, and corporate facilities, it is increasingly common for multiple compute clusters to exist in a relatively small geographic area. These clusters represent a significant investment, but effectively leveraging this investment across clusters is a challenge. Dynamic Virtual Clustering has been shown to be an effective way to increase utilization, decrease job turnaround time, and increase workload throughput in a multi-cluster environment on a small geographic scale. Dynamic Virtual Clustering is a system for flexibly and seamlessly deploying virtual machines in a single or multi-cluster environment. The amount of time required to deploy virtual machines may be prohibitively large, especially when the jobs designated to run inside the virtual machines are short-lived. In this paper we examine the overhead of deploying virtual machine images, and present an implementation of image caching as a way to reduce this overhead. I. INTRODUCTION At many university, government, and corporate facilities, it is increasingly common for multiple compute clusters to exist in a relatively small geographic area. These clusters represent a significant investment, but effectively leveraging this investment across clusters is a challenge. Dynamic Virtual Clustering (DVC) has been shown to be an effective way to increase utilization, decrease job turnaround time, and increase workload throughput in a multi-cluster environment on a small geographic scale. DVC is a system for flexibly and seamlessly deploying virtual machines (VMs) across a single or multi-cluster environment. DVC tightly integrates VM technology with the cluster&amp;#x27;s resource management and scheduling software to allows jobs to run on any cluster in any software environment while effectively sandboxing users and applications from the host system. DVC uses VMs in a cluster environment by staging images to compute nodes and booting the VMs on those nodes. However, the amount of time required to stage and boot VMs may be prohibitively large, especially when the jobs designated to run inside the VMs are short-lived. In this paper we examine the overhead of staging and booting, consider issues associated with caching VM images, and present an implementation of image caching as a way to reduce this overhead. The basic implementation and analysis of caching presented here will later be used to create intelligent scheduling algorithms and heuristics that use cache information to reduce overhead due to VM use. Section II examines DVC, virtualization, and resource management in cluster environments with respect to virtual machines. Section III examines the initial implementation of VM creation, details image caching as a way to reduce the overhead of staging and booting VM images, and enumerates situations where caching can cause unexpected and incorrect behavior.

Research paper thumbnail of Dynamic Virtual Clustering with Xen and Moab

Springer eBooks, 2006

Abstract. As larger and larger commodity clusters for high perfor-mance computing proliferate at ... more Abstract. As larger and larger commodity clusters for high perfor-mance computing proliferate at research institutions around the world, challenges in maintaining effective use of these systems also continue to increase. Among the many challenges are maintaining the appropriate ...

Research paper thumbnail of Future Directions for Parallel and Distributed Computing: SPX 2019 Workshop Report

Research paper thumbnail of Architecture for an Offline Parallel Debugger

This paper provides and overview of the {it GDBase} framework for offline parallel debuggers. The... more This paper provides and overview of the {it GDBase} framework for offline parallel debuggers. The framework was designed to become the basis of debugging tools which scale successfully on systems with tens to hundreds of thousands of cores. With several systems coming online at more than 50,000 cores in the past year, debuggers which can run at these scales are

Research paper thumbnail of Wrangler's user environment: A software framework for management of data-intensive computing system

The growth in the capacity and capability of NAND Flash based storage systems have changed the fa... more The growth in the capacity and capability of NAND Flash based storage systems have changed the face of data oriented computational systems. These systems have become both more capable and flexible in how they are used. With these changes comes both increased potential and user complexity. While many systems attempt to hide this complexity through the addition of more layers of storage caches, the design of the Wrangler system went a different route, choosing instead to build a simple yet flexible web based interface to allow users to easily configure this complex data computing system based on their service and software needs. This allows users to work in the environments best suited to their workflows while optimally utilizing the systems high performance and high capacity storage systems. This interface also allows users to schedule long term periods of reserved capacity, "data campaigns", for projects. Finally, the system has been designed to support the data storage and sharing capacities of the system to enable these key aspects of data research. We discuss the capabilities with respect to three already existing workflows on the system to highlight the diversity and flexibility provided by this environment to data researchers.

Research paper thumbnail of Implementation and analysis of numerical components for reconfigurable computing

In the past, reconfigurable computing has not been an option for accelerating scientific algorith... more In the past, reconfigurable computing has not been an option for accelerating scientific algorithms (which require complex floating-point operations) and other similar applications due to limited FPGA density. However, the rapid increase of FPGA densities over the past several years has altered this situation. The central goal of the Reconfigurable Computing Application Development Environment (RCADE) is to capitalize on these

Research paper thumbnail of The iPlant Collaborative: Cyberinfrastructure to Feed the World

IEEE Computer, Nov 1, 2011

Research paper thumbnail of CyVerse: Cyberinfrastructure for open science

PLOS computational biology/PLoS computational biology, Feb 7, 2024

Research paper thumbnail of Can the United States Maintain Its Leadership in High-Performance Computing? - A report from the ASCAC Subcommittee on American Competitiveness and Innovation to the ASCR Office

Research paper thumbnail of TACC in 2020: COVID-19, and the Next Generation of Cyberinfrastructure

Research paper thumbnail of The Path to Exascale

Texas Advanced Computing Center (TACC

Research paper thumbnail of Stampede 2

The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than e... more The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than eight million successful computational simulations and data analysis jobs to more than ten thousand users. In addition, Stampede 1 introduced new technology that began to move users towards many core processors. As Stampede 1 reaches the end of its production life, it is being replaced in phases by a new supercomputer, Stampede 2, that will not only take up much of the original system's workload, but continue the bridge to technologies on the path to exascale computing. This paper provides a brief summary of the experiences of Stampede 1, and details the design and architecture of Stampede 2. Early results are presented from a subset of Intel Knights Landing nodes that are bridging between the two systems.

Research paper thumbnail of The iPlant Collaborative- bioinformatics infrastructure, tools, and resources

Plant and Animal Genome XX Conference (January 14-18, 2012), Jan 16, 2012

Research paper thumbnail of Offline parallel debugging

Debugging is difficult; debugging parallel programs at large scale is particularly so. Interactiv... more Debugging is difficult; debugging parallel programs at large scale is particularly so. Interactive debugging tools continue to improve in ways that mitigate the difficulties, and the best such systems will continue to be mission critical. Such tools have their limitations, however. They are often unable to operate across many thousands of cores. Even when they do function correctly, mining and

Research paper thumbnail of Jetstream: A Novel Cloud System for Science

CRC Press eBooks, May 8, 2019

Research paper thumbnail of InfiniBand routing---InfiniBand routing and switching

InfiniBand has emerged as a new high bandwidth, low latency standard for high performance computi... more InfiniBand has emerged as a new high bandwidth, low latency standard for high performance computing, but as a technology, is still focused on Layer 2 switching. Standards have not yet been defined for InfiniBand Layer 3 Routing, required for additional scalability, distance reach, security, and fault tolerance and isolation.The meeting will consist of:Product Leads from InfiniBand vendors discussing the unique

Research paper thumbnail of Energy-saving effects by using 380vdc power supply system interconnected with a solar power generation system in texas

Power consumption of ICT facilities and data centers has grown, and this has led to a need to imp... more Power consumption of ICT facilities and data centers has grown, and this has led to a need to improve energy efficiency of these facilities. DC power distribution systems employing 380VDC as the supply voltage is one promising approach to address this problem for countries around the world developing and deploying commercial services. We demonstrated a 380VDC power distribution system interconnected with a solar power generation system in Texas, USA. The purpose of this demonstration was to show that a 380VDC power supply system saves more energy than an AC power supply system, and to show how much carbon dioxide emissions can be reduced by integrating a solar power generation system. This demonstration resulted in an approximate 17% energy reduction compared with an AC power supply system having the same level of reliability. Also, an evaluation using Data center Performance Per Energy (DPPE) as a performance index of the efficiency of data centers was carried out. The results showed that Power Usage Effectiveness (PUE), one of the sub-metrics of DPPE, improved with the 380VDC power supply system compared with the AC power supply system.

Research paper thumbnail of <title>Development environment for configurable computing</title>

Proceedings of SPIE, Oct 8, 1998

Research paper thumbnail of Building Wrangler: A transformational data intensive resource for the open science community

Research paper thumbnail of Optimizing the PCIT algorithm on stampede's Xeon and Xeon Phi processors for faster discovery of biological networks

ABSTRACT The PCIT method is an important technique for detecting interactions between networks. T... more ABSTRACT The PCIT method is an important technique for detecting interactions between networks. The PCIT algorithm has been used in the biological context to infer complex regulatory mechanisms and interactions in genetic networks, in genome wide association studies, and in other similar problems. In this work, the PCIT algorithm is re-implemented with exemplary parallel, vector, I/O, memory and instruction optimizations for today&#39;s multi- and many-core architectures. The evolution and performance of the new code targets the processor architectures of the Stampede supercomputer, but will also benefit other architectures. The Stampede system consists of an Intel Xeon E5 processor base system with an innovative component comprised of Intel Xeon Phi Coprocessors. Optimized results and an analysis are presented for both the Xeon and the Xeon Phi.

Research paper thumbnail of Efficient virtual machine caching in dynamic virtual clusters

At many university, government, and corporate facilities, it is increasingly common for multiple ... more At many university, government, and corporate facilities, it is increasingly common for multiple compute clusters to exist in a relatively small geographic area. These clusters represent a significant investment, but effectively leveraging this investment across clusters is a challenge. Dynamic Virtual Clustering has been shown to be an effective way to increase utilization, decrease job turnaround time, and increase workload throughput in a multi-cluster environment on a small geographic scale. Dynamic Virtual Clustering is a system for flexibly and seamlessly deploying virtual machines in a single or multi-cluster environment. The amount of time required to deploy virtual machines may be prohibitively large, especially when the jobs designated to run inside the virtual machines are short-lived. In this paper we examine the overhead of deploying virtual machine images, and present an implementation of image caching as a way to reduce this overhead. I. INTRODUCTION At many university, government, and corporate facilities, it is increasingly common for multiple compute clusters to exist in a relatively small geographic area. These clusters represent a significant investment, but effectively leveraging this investment across clusters is a challenge. Dynamic Virtual Clustering (DVC) has been shown to be an effective way to increase utilization, decrease job turnaround time, and increase workload throughput in a multi-cluster environment on a small geographic scale. DVC is a system for flexibly and seamlessly deploying virtual machines (VMs) across a single or multi-cluster environment. DVC tightly integrates VM technology with the cluster&amp;#x27;s resource management and scheduling software to allows jobs to run on any cluster in any software environment while effectively sandboxing users and applications from the host system. DVC uses VMs in a cluster environment by staging images to compute nodes and booting the VMs on those nodes. However, the amount of time required to stage and boot VMs may be prohibitively large, especially when the jobs designated to run inside the VMs are short-lived. In this paper we examine the overhead of staging and booting, consider issues associated with caching VM images, and present an implementation of image caching as a way to reduce this overhead. The basic implementation and analysis of caching presented here will later be used to create intelligent scheduling algorithms and heuristics that use cache information to reduce overhead due to VM use. Section II examines DVC, virtualization, and resource management in cluster environments with respect to virtual machines. Section III examines the initial implementation of VM creation, details image caching as a way to reduce the overhead of staging and booting VM images, and enumerates situations where caching can cause unexpected and incorrect behavior.

Research paper thumbnail of Dynamic Virtual Clustering with Xen and Moab

Springer eBooks, 2006

Abstract. As larger and larger commodity clusters for high perfor-mance computing proliferate at ... more Abstract. As larger and larger commodity clusters for high perfor-mance computing proliferate at research institutions around the world, challenges in maintaining effective use of these systems also continue to increase. Among the many challenges are maintaining the appropriate ...

Research paper thumbnail of Future Directions for Parallel and Distributed Computing: SPX 2019 Workshop Report

Research paper thumbnail of Architecture for an Offline Parallel Debugger

This paper provides and overview of the {it GDBase} framework for offline parallel debuggers. The... more This paper provides and overview of the {it GDBase} framework for offline parallel debuggers. The framework was designed to become the basis of debugging tools which scale successfully on systems with tens to hundreds of thousands of cores. With several systems coming online at more than 50,000 cores in the past year, debuggers which can run at these scales are

Research paper thumbnail of Wrangler's user environment: A software framework for management of data-intensive computing system

The growth in the capacity and capability of NAND Flash based storage systems have changed the fa... more The growth in the capacity and capability of NAND Flash based storage systems have changed the face of data oriented computational systems. These systems have become both more capable and flexible in how they are used. With these changes comes both increased potential and user complexity. While many systems attempt to hide this complexity through the addition of more layers of storage caches, the design of the Wrangler system went a different route, choosing instead to build a simple yet flexible web based interface to allow users to easily configure this complex data computing system based on their service and software needs. This allows users to work in the environments best suited to their workflows while optimally utilizing the systems high performance and high capacity storage systems. This interface also allows users to schedule long term periods of reserved capacity, "data campaigns", for projects. Finally, the system has been designed to support the data storage and sharing capacities of the system to enable these key aspects of data research. We discuss the capabilities with respect to three already existing workflows on the system to highlight the diversity and flexibility provided by this environment to data researchers.

Research paper thumbnail of Implementation and analysis of numerical components for reconfigurable computing

In the past, reconfigurable computing has not been an option for accelerating scientific algorith... more In the past, reconfigurable computing has not been an option for accelerating scientific algorithms (which require complex floating-point operations) and other similar applications due to limited FPGA density. However, the rapid increase of FPGA densities over the past several years has altered this situation. The central goal of the Reconfigurable Computing Application Development Environment (RCADE) is to capitalize on these

Research paper thumbnail of The iPlant Collaborative: Cyberinfrastructure to Feed the World

IEEE Computer, Nov 1, 2011