Ian Foster | Argonne National Laboratory (original) (raw)
Uploads
Papers by Ian Foster
We describe the implementation and evaluate the performance of a Replica Location Service that is... more We describe the implementation and evaluate the performance of a Replica Location Service that is part of the Globus Toolkit Versions 3.0 and 4.0. A Replica Location Service (RLS) provides a mechanism for registering the existence of replicas and discovering them. Features of our implementation include the use of soft state update protocols to populate a distributed index and optional Bloom filter compression to reduce the size of these updates. We present the design of the RLS system and different deployment options for the distributed index.
Climate models are both outputting larger and larger amounts of data and are doing it on more sop... more Climate models are both outputting larger and larger amounts of data and are doing it on more sophisticated numerical grids. The tools climate scientists have used to analyze climate output, an essential component of climate modeling, are single threaded and assume rectangular structured grids in their analysis algorithms. We are bringing both task-and dataparallelism to the analysis of climate model output. We have created a new data-parallel library, the Parallel Gridded Analysis Library (ParGAL) which can read in data using parallel I/O, store the data on a compete representation of the structured or unstructured mesh and perform sophisticated analysis on the data in parallel. ParGAL has been used to create a parallel version of a script-based analysis and visualization package. Finally, we have also taken current workflows and employed task-based parallelism to decrease the total execution time.
An emerging class of data-intensive applications involve the geographically dispersed extraction ... more An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.
The long-term vision of the Fusion Collaboratory described in this paper is to transform fusion r... more The long-term vision of the Fusion Collaboratory described in this paper is to transform fusion research and accelerate scientific understanding and innovation so as to revolutionize the design of a fusion energy source. The Collaboratory will create and deploy collaborative software tools that will enable more efficient utilization of existing experimental facilities and more effective integration of experiment, theory, and modeling. The computer science research necessary to create the Collaboratory is centered on three activities: security, remote and distributed computing, and scientific visualization. It is anticipated that the presently envisioned Fusion Collaboratory software tools will require three years to complete. Introduction. In 1999, the United States Department of Energy (USDOE) Office of Fusion Energy Sciences (OFES) established the Plasma Science Advanced Computing Initiative (PSACI) to revolutionize fusion research by greatly enhancing simulation and modeling capabilities made accessible by terascale computing [1]. These advanced computational
This paper proposes extensions of sequential programming languages for parallel programming that ... more This paper proposes extensions of sequential programming languages for parallel programming that have the following features:
CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., 2005
The increasingly common practice of using multiple distributed storage systems as a distributed d... more The increasingly common practice of using multiple distributed storage systems as a distributed data store within which large datasets may be replicated has led to the problem of how to access replicated data efficiently. Multiple-source parallel transfers can improve data throughput time by transferring data from several replicas in parallel. However, we then face the problem of deciding how to distribute the data load among different storage resources. We propose a Tuned Conservative scheduling technique that uses predicted mean and variance network information to make data distribution decisions. This stochastic scheduling technique uses a tuning factor to adjust the amount of the data assigned to a link in accordance with the variability of the network performance. We apply our technique to the GridFTP implementation in the Globus Toolkit and demonstrate that the technique can produce data transfer times that are significantly faster and less variable than those of other techniques.
As the size of scientific data sets and the resources required for analysis increase, data locali... more As the size of scientific data sets and the resources required for analysis increase, data locality becomes crucial to the efficient use of large scale distributed systems for scientific and data-intensive applications. In order to support interactive analysis of large quantities of data in many scientific disciplines, we propose a data diffusion approach, in which the resources required for data analysis are acquired dynamically, in response to demand. Acquired resources (compute and storage) can be "cached" for some time, thus allowing more rapid responses to subsequent requests. We define an abstract model for data-centric task farms as a common parallel pattern that drives the independent computational tasks, taking into consideration the data locality in order to optimize the performance of the analysis of large datasets. This approach can provide the benefits of dedicated hardware without the associated high costs. We will validate our abstract model through discrete-event simulations; we expect simulations to show the model is both efficient and scalable given a wide range of simulation parameters. To explore the practical realization of our abstract model, we have developed a Fast and Light-weight tasK executiON framework (Falkon). Falkon provides for dynamic acquisition and release of resources, data management capabilities, and the dispatch of analysis tasks via a data-aware scheduler. We have integrated Falkon into the Swift parallel programming system in order to leverage a large number of applications from various domains (astronomy, astro-physics, medicine, chemistry, economics, etc) which cover a variety of different datasets, workloads, and analysis codes. We believe our data-centric task farm model to generalize to many domains and applications, and could offer application end-to-end performance improvements, higher resource utilization, improved efficiency, and better application scalability.
Abstract Several parallel algorithms for Fock matrix construction are described. The algorithms c... more Abstract Several parallel algorithms for Fock matrix construction are described. The algorithms calculate only the unique integrals, distribute the Fock and density matrices over the processors of a massively parallel computer, use blocking techniques to construct the distributed data structures, and use clustering techniques on each processor to maximize data reuse. Algorithms based on both square and row blocked distributions of the Fock and density matrices are described and evaluated. Variants of the algorithms are discussed ...
The coordinated use of geographically distributed computers, or metacomputing, can in principle p... more The coordinated use of geographically distributed computers, or metacomputing, can in principle provide more accessible and cost- effective supercomputing than conventional high-performance systems. However, we lack evidence that metacomputing systems can be made easily usable, or that there exist large numbers of applications able to exploit metacomputing resources. In this paper, we present work that addresses both these concerns. The
Multicast communication primitives have broad utility as building blocks for distributed applicat... more Multicast communication primitives have broad utility as building blocks for distributed applications. The challenge is to create and maintain the distributed structures that support these primitives while accounting for volatile end-nodes and variable network characteristics. Most solutions proposed to date rely on complex algorithms or global information, thus limiting the scale of deployments and acceptance outside the academic realm.
Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model i... more Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of parallel programs. Yet there are large classes of applications for which a mixture of task and data parallelism is most appropriate. Such applications can be structured as collections of data-parallel tasks that communicate by using explicit message passing. Because the Message Passing Interface (MPI) defines standardized, familiar mechanisms for this communication model, we propose that HPF tasks communicate by making calls to a coordination library that provides an HPF binding for MPI. The semantics of a communication interface for sequential languages can be ambiguous when the interface is invoked from a parallel language; we show how these ambiguities can be resolved by describing one possible HPF binding for MPI. We then present the design of a...
Many computations can be structured as sets of communicating data-parallel tasks. Individual task... more Many computations can be structured as sets of communicating data-parallel tasks. Individual tasks may be coded in HPF, pC++, etc.; periodically, tasks exchange distributed arrays via channel operations, virtual file operations, message passing, etc. The implementation of these operations is complicated by the fact that the processes engaging in the communication may execute on different numbers of processors and may have different distributions for communicated data structures. In addition, they may be connected by different sorts of networks. In this paper, we describe a communicating data-parallel tasks (CDT) library that we are developing for constructing applications of this sort. We outline the techniques used to implement this library, and we describe a range of data transfer strategies and several algorithms based on these strategies. We also present performance results for several algorithms. The CDT library is being used as a compiler target for an HPF compiler augmented w...
An important mode of Grid operation is one in which a community or (as we call it here) a virtual... more An important mode of Grid operation is one in which a community or (as we call it here) a virtual organization (VO) negotiates an allocation from a resource provider and then disperses that allocation across its members according to VO policy. Implementing this model requires that a VO be able to deploy and operate its own resource management services within the Grid. We argue that a mechanism that allows for the creation, and subsequent monitoring and control, of managed computations provides a simple yet flexible solution to this requirement. We present an architectural framework that addresses the security, policy specification, and policy enforcement concerns that arise in this context. We also describe an implementation based on Globus Toolkit and Condor components, and present performance results.
This paper describes a project to evaluate the feasibility of combining Grid and Numerical Propul... more This paper describes a project to evaluate the feasibility of combining Grid and Numerical Propulsion System Simulation (NPSS) technologies, with a view to leveraging the numerous advantages of commodity technologies in a high-performance Grid environment. A team from the NASA Glenn Research Center and Argonne National Laboratory has been studying three problems: a desktop-controlled parameter study using Excel (Microsoft Corporation); a multicomponent application using ADPAC, NPSS, and a controller program-, and an aviation safety application running about 100 jobs in near real time. The team has successfully demonstrated (1) a Common-Object- Request-Broker-Architecture- (CORBA-) to-Globus resource manager gateway that allows CORBA remote procedure calls to be used to control the submission and execution of programs on workstations and massively parallel computers, (2) a gateway from the CORBA Trader service to the Grid information service, and (3) a preliminary integration of CORB...
We describe the implementation and evaluate the performance of a Replica Location Service that is... more We describe the implementation and evaluate the performance of a Replica Location Service that is part of the Globus Toolkit Versions 3.0 and 4.0. A Replica Location Service (RLS) provides a mechanism for registering the existence of replicas and discovering them. Features of our implementation include the use of soft state update protocols to populate a distributed index and optional Bloom filter compression to reduce the size of these updates. We present the design of the RLS system and different deployment options for the distributed index.
Climate models are both outputting larger and larger amounts of data and are doing it on more sop... more Climate models are both outputting larger and larger amounts of data and are doing it on more sophisticated numerical grids. The tools climate scientists have used to analyze climate output, an essential component of climate modeling, are single threaded and assume rectangular structured grids in their analysis algorithms. We are bringing both task-and dataparallelism to the analysis of climate model output. We have created a new data-parallel library, the Parallel Gridded Analysis Library (ParGAL) which can read in data using parallel I/O, store the data on a compete representation of the structured or unstructured mesh and perform sophisticated analysis on the data in parallel. ParGAL has been used to create a parallel version of a script-based analysis and visualization package. Finally, we have also taken current workflows and employed task-based parallelism to decrease the total execution time.
An emerging class of data-intensive applications involve the geographically dispersed extraction ... more An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.
The long-term vision of the Fusion Collaboratory described in this paper is to transform fusion r... more The long-term vision of the Fusion Collaboratory described in this paper is to transform fusion research and accelerate scientific understanding and innovation so as to revolutionize the design of a fusion energy source. The Collaboratory will create and deploy collaborative software tools that will enable more efficient utilization of existing experimental facilities and more effective integration of experiment, theory, and modeling. The computer science research necessary to create the Collaboratory is centered on three activities: security, remote and distributed computing, and scientific visualization. It is anticipated that the presently envisioned Fusion Collaboratory software tools will require three years to complete. Introduction. In 1999, the United States Department of Energy (USDOE) Office of Fusion Energy Sciences (OFES) established the Plasma Science Advanced Computing Initiative (PSACI) to revolutionize fusion research by greatly enhancing simulation and modeling capabilities made accessible by terascale computing [1]. These advanced computational
This paper proposes extensions of sequential programming languages for parallel programming that ... more This paper proposes extensions of sequential programming languages for parallel programming that have the following features:
CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., 2005
The increasingly common practice of using multiple distributed storage systems as a distributed d... more The increasingly common practice of using multiple distributed storage systems as a distributed data store within which large datasets may be replicated has led to the problem of how to access replicated data efficiently. Multiple-source parallel transfers can improve data throughput time by transferring data from several replicas in parallel. However, we then face the problem of deciding how to distribute the data load among different storage resources. We propose a Tuned Conservative scheduling technique that uses predicted mean and variance network information to make data distribution decisions. This stochastic scheduling technique uses a tuning factor to adjust the amount of the data assigned to a link in accordance with the variability of the network performance. We apply our technique to the GridFTP implementation in the Globus Toolkit and demonstrate that the technique can produce data transfer times that are significantly faster and less variable than those of other techniques.
As the size of scientific data sets and the resources required for analysis increase, data locali... more As the size of scientific data sets and the resources required for analysis increase, data locality becomes crucial to the efficient use of large scale distributed systems for scientific and data-intensive applications. In order to support interactive analysis of large quantities of data in many scientific disciplines, we propose a data diffusion approach, in which the resources required for data analysis are acquired dynamically, in response to demand. Acquired resources (compute and storage) can be "cached" for some time, thus allowing more rapid responses to subsequent requests. We define an abstract model for data-centric task farms as a common parallel pattern that drives the independent computational tasks, taking into consideration the data locality in order to optimize the performance of the analysis of large datasets. This approach can provide the benefits of dedicated hardware without the associated high costs. We will validate our abstract model through discrete-event simulations; we expect simulations to show the model is both efficient and scalable given a wide range of simulation parameters. To explore the practical realization of our abstract model, we have developed a Fast and Light-weight tasK executiON framework (Falkon). Falkon provides for dynamic acquisition and release of resources, data management capabilities, and the dispatch of analysis tasks via a data-aware scheduler. We have integrated Falkon into the Swift parallel programming system in order to leverage a large number of applications from various domains (astronomy, astro-physics, medicine, chemistry, economics, etc) which cover a variety of different datasets, workloads, and analysis codes. We believe our data-centric task farm model to generalize to many domains and applications, and could offer application end-to-end performance improvements, higher resource utilization, improved efficiency, and better application scalability.
Abstract Several parallel algorithms for Fock matrix construction are described. The algorithms c... more Abstract Several parallel algorithms for Fock matrix construction are described. The algorithms calculate only the unique integrals, distribute the Fock and density matrices over the processors of a massively parallel computer, use blocking techniques to construct the distributed data structures, and use clustering techniques on each processor to maximize data reuse. Algorithms based on both square and row blocked distributions of the Fock and density matrices are described and evaluated. Variants of the algorithms are discussed ...
The coordinated use of geographically distributed computers, or metacomputing, can in principle p... more The coordinated use of geographically distributed computers, or metacomputing, can in principle provide more accessible and cost- effective supercomputing than conventional high-performance systems. However, we lack evidence that metacomputing systems can be made easily usable, or that there exist large numbers of applications able to exploit metacomputing resources. In this paper, we present work that addresses both these concerns. The
Multicast communication primitives have broad utility as building blocks for distributed applicat... more Multicast communication primitives have broad utility as building blocks for distributed applications. The challenge is to create and maintain the distributed structures that support these primitives while accounting for volatile end-nodes and variable network characteristics. Most solutions proposed to date rely on complex algorithms or global information, thus limiting the scale of deployments and acceptance outside the academic realm.
Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model i... more Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of parallel programs. Yet there are large classes of applications for which a mixture of task and data parallelism is most appropriate. Such applications can be structured as collections of data-parallel tasks that communicate by using explicit message passing. Because the Message Passing Interface (MPI) defines standardized, familiar mechanisms for this communication model, we propose that HPF tasks communicate by making calls to a coordination library that provides an HPF binding for MPI. The semantics of a communication interface for sequential languages can be ambiguous when the interface is invoked from a parallel language; we show how these ambiguities can be resolved by describing one possible HPF binding for MPI. We then present the design of a...
Many computations can be structured as sets of communicating data-parallel tasks. Individual task... more Many computations can be structured as sets of communicating data-parallel tasks. Individual tasks may be coded in HPF, pC++, etc.; periodically, tasks exchange distributed arrays via channel operations, virtual file operations, message passing, etc. The implementation of these operations is complicated by the fact that the processes engaging in the communication may execute on different numbers of processors and may have different distributions for communicated data structures. In addition, they may be connected by different sorts of networks. In this paper, we describe a communicating data-parallel tasks (CDT) library that we are developing for constructing applications of this sort. We outline the techniques used to implement this library, and we describe a range of data transfer strategies and several algorithms based on these strategies. We also present performance results for several algorithms. The CDT library is being used as a compiler target for an HPF compiler augmented w...
An important mode of Grid operation is one in which a community or (as we call it here) a virtual... more An important mode of Grid operation is one in which a community or (as we call it here) a virtual organization (VO) negotiates an allocation from a resource provider and then disperses that allocation across its members according to VO policy. Implementing this model requires that a VO be able to deploy and operate its own resource management services within the Grid. We argue that a mechanism that allows for the creation, and subsequent monitoring and control, of managed computations provides a simple yet flexible solution to this requirement. We present an architectural framework that addresses the security, policy specification, and policy enforcement concerns that arise in this context. We also describe an implementation based on Globus Toolkit and Condor components, and present performance results.
This paper describes a project to evaluate the feasibility of combining Grid and Numerical Propul... more This paper describes a project to evaluate the feasibility of combining Grid and Numerical Propulsion System Simulation (NPSS) technologies, with a view to leveraging the numerous advantages of commodity technologies in a high-performance Grid environment. A team from the NASA Glenn Research Center and Argonne National Laboratory has been studying three problems: a desktop-controlled parameter study using Excel (Microsoft Corporation); a multicomponent application using ADPAC, NPSS, and a controller program-, and an aviation safety application running about 100 jobs in near real time. The team has successfully demonstrated (1) a Common-Object- Request-Broker-Architecture- (CORBA-) to-Globus resource manager gateway that allows CORBA remote procedure calls to be used to control the submission and execution of programs on workstations and massively parallel computers, (2) a gateway from the CORBA Trader service to the Grid information service, and (3) a preliminary integration of CORB...
… Journal of High …, Jan 1, 1996
Journal of Parallel and Distributed …, Jan 1, 1995
Physics Today, Jan 1, 2002
Journal of Parallel and Distributed …, Jan 1, 2003
Journal of Computer Science and Technology, Jan 1, 2006
High Performance …, Jan 1, 2002
International Journal of High …, Jan 1, 2001
The past decade has seen the widespread release of open data concerning city services, conditions... more The past decade has seen the widespread release of open data concerning city services, conditions, and activities by government bodies and public institutions of all sizes. Hundreds of open data portals now host thousands of datasets of many different types. These new data sources represent enormous po- tential for improved understanding of urban dynamics and processes—and, ultimately, for more livable, efficient, and prosperous communities. However, those who seek to realize this potential quickly discover that discovering and applying those data relevant to any particular question can be extraordinarily dif- ficult, due to decentralized storage, heterogeneous formats, and poor documentation. In this context, we introduce Plenario, a platform designed to automating time-consuming tasks associated with the discovery, exploration, and application of open city data—and, in so doing, reduce barriers to data use for researchers, policymakers, service providers, journalists, and members of the general public. Key innovations include a geospatial data warehouse that allows data from many sources to be registered into a common spatial and temporal frame; simple and intuitive interfaces that permit rapid discovery and exploration of data subsets pertaining to a particular area and time, regardless of type and source; easy export of such data subsets for further analysis; a user-configurable data ingest framework for automated importing and periodic updating of new datasets into the data warehouse; cloud hosting for elastic scaling and rapid creation of new Plenario instances; and an open source implementation to enable community contributions. We describe here the architecture and implementation of the Plenario platform, discuss lessons learned from its use by several communities, and outline plans for future work.
... Authors, Ian Foster, Stephen Taylor, Publisher, Prentice-Hall, Inc. ... 1994. Ian Foster , St... more ... Authors, Ian Foster, Stephen Taylor, Publisher, Prentice-Hall, Inc. ... 1994. Ian Foster , Stephen Taylor, A compiler approach to scalable concurrent-program design, ACM Transactions on Programming Languages and Systems (TOPLAS), v.16 n.3, p.577-604, May 1994. ...
Journal of Network and …, Jan 1, 2000
... Designing and building parallel programs concepts and tools for parallel software engineering... more ... Designing and building parallel programs concepts and tools for parallel software engineering. Auteur : FOSTER Ian. Date de parution : 04-1995 Langue : ANGLAIS Env. 380p. 16X24 Paperback Épuisé. Résumé de Designing and building parallel programs concepts and... : ...