Gabriel Mateescu - Academia.edu (original) (raw)
Uploads
Papers by Gabriel Mateescu
2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010
Abstract The seventh workshop will be held in conjunction with IPDPS 2010 in Atlanta. It will giv... more Abstract The seventh workshop will be held in conjunction with IPDPS 2010 in Atlanta. It will give a forum to researchers and engineers to present their results in grid and distributed computing. Special areas of interest will be grid middleware, grid applications, grid ...
arXiv (Cornell University), May 16, 2003
A Grid testbed has been established using resources at 12 sites across Canada involving researche... more A Grid testbed has been established using resources at 12 sites across Canada involving researchers from particle physics as well as other fields of science. We describe our use of the testbed with the BaBar Monte Carlo production and the ATLAS data challenge software. In each case the remote sites have no application-specific software stored locally and instead access the software and data via AFS and/or GridFTP from servers located in Victoria. In the case of BaBar, an Objectivity database server was used for data storage. We present the results of a series of initial tests of the Grid testbed using both BaBar and ATLAS applications. The initial results demonstrate the feasibility of using generic Grid resources for HEP applications.
Fluids Engineering, 1998
We present a parallel preconditioned-GMRES algorithm for solving second-order elliptic PDEs defin... more We present a parallel preconditioned-GMRES algorithm for solving second-order elliptic PDEs defined on rectangular domains with Dirichlet and Neumann boundary conditions, and discretized with piecewise Hermite bicubics. The parallel performance of the algorithm is assessed by way of numerical experiments.
The GridX1 computational Grid: from a set of service-specific protocols to a
A large number of Grids have been developed worldwide. Despite being mostly based on the same und... more A large number of Grids have been developed worldwide. Despite being mostly based on the same underlying middleware, the Globus Toolkit, they are generally not inter-operable for a variety of reasons. We present a method of federating those disparate grids which are based on the Globus Toolkit, together with a concrete example of interfacing the LHC Computing Grid (LCG) with HEPGrid. HEPGrid consists of shared resources, at several Canadian research institutes, which are exposed via Globus gatekeepers, and makes use of Condor-G for resource advertisement, matchmaking and job submission. An LCG Computing Element (CE) based at the TRIUMF Laboratory hosts a HEPGrid User Interface (UI) that is contained within a custom JobManager. This JobManager appears in the LCG information system as a normal CE publishing an aggregation of the HEPGrid resources. The interface interprets the incoming job in terms of HEPGrid UI usage, submits it onto HEPGrid, and implements the JobManager 'poll...
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011
Journal of Physics: Conference Series, 2008
The present paper highlights the approach used to design and implement a web services based BaBar... more The present paper highlights the approach used to design and implement a web services based BaBar Monte Carlo (MC) production grid using Globus Toolkit version 4. The grid integrates the resources of two clusters at the University of Victoria, using the ClassAd mechanism provided by the Condor-G metascheduler. Each cluster uses the Portable Batch System (PBS) as its local resource
An instance of the vector sorting problem is a sequence of k-dimensional vectors of length n. A s... more An instance of the vector sorting problem is a sequence of k-dimensional vectors of length n. A solution to the problem is a permutation of the vectors such that in each dimension the length of the longest decreasing subsequence is O(sqrt(n)). A random permutation solves the problem. Here we derandomize the obvious probabilistic algorithm and obtain a deterministic O(kn^3.5) time algorithm that solves the vector sorting problem. We also apply the algorithm to a book embedding problem.
Lecture Notes in Computer Science, 2002
... 1 National Research Council of Canada Institute for Information Technology 1200 Montreal Road... more ... 1 National Research Council of Canada Institute for Information Technology 1200 Montreal Road, Ottawa ON K1A 0R6, Canada julio.valdes@nrc.ca 2 National Research Council of Canada Information Management Services Branch 100 Sussex Drive, Ottawa ON K1A 0R6 ...
Proceedings of the Second International Workshop, Nov 13, 2011
We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We... more We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We develop a matrix transpose algorithm that uses cache blocking, cache prefetching and data alignment. We model the POWER7 data cache and memory concurrency and use the model to predict the memory throughput of the proposed matrix transpose algorithm. The performance of our matrix transpose algorithm is up to five times higher than that of the dgetmo routine of the Engineering and Scientific Subroutine Library and is 2.5 times higher than that of the code generated by compiler-inserted prefetching. Numerical experiments indicate a good agreement between the predicted and the measured memory throughput.
Statistics: A Series of Textbooks and Monographs, 2004
ABSTRACT
Electronic transactions on numerical analysis ETNA
Incomplete factorization preconditioners such as ILU, ILUT and MILU are well-known robust general... more Incomplete factorization preconditioners such as ILU, ILUT and MILU are well-known robust general-purpose techniques for solving linear systems on serial computers. However, they are difficult to parallelize efficiently. Various techniques have been used to parallelize these preconditioners, such as multicolor orderings and subdomain preconditioning. These techniques may degrade the performance and robustness of ILU preconditionings. The purpose of this paper is to perform numerical experiments to compare these techniques in order to assess what are the most effective ways to use ILU preconditioning for practical problems on serial and parallel computers.
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems - PMBS '11, 2011
We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We... more We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We develop a matrix transpose algorithm that uses cache blocking, cache prefetching and data alignment. We model the POWER7 data cache and memory concurrency and use the model to predict the memory throughput of the proposed matrix transpose algorithm. The performance of our matrix transpose algorithm is up to five times higher than that of the dgetmo routine of the Engineering and Scientific Subroutine Library and is 2.5 times higher than that of the code generated by compiler-inserted prefetching. Numerical experiments indicate a good agreement between the predicted and the measured memory throughput.
21st International Symposium on High Performance Computing Systems and Applications (HPCS'07), 2007
GridX1 is a computational Grid designed and built to link resources at a number of research insti... more GridX1 is a computational Grid designed and built to link resources at a number of research institutions across Canada. Building upon the experience of designing, deploying and operating the first generation of GridX1, we have designed a second-generation, web-services-based, computational Grid. The second generation of GridX1 leverages the Web Services Resource Framework, implemented by the Globus Toolkit version 4. The value added by GridX1 includes metascheduling, file staging, resource registry and resource monitoring.
Web Information Systems and Technologies, 2005
Grid portals typically store user grid credentials in a credential repository. Credential reposit... more Grid portals typically store user grid credentials in a credential repository. Credential repositories allow users to access Grid portals from any machine having a Web browser, but their usage requires several authentication steps. Current portals require users to explicitly go through these steps, thereby hindering their usability. In this paper we present intuitive and easy to use tools to manage certificates. We also describe the integration of Grid Security Infrastructure authentication into a Java-based SSH terminal tool. Based on these tools, we build an innovative portal authentication mechanism that enables transparent delegation of credentials between clients, grid portal and the credential repository.
2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010
Abstract The seventh workshop will be held in conjunction with IPDPS 2010 in Atlanta. It will giv... more Abstract The seventh workshop will be held in conjunction with IPDPS 2010 in Atlanta. It will give a forum to researchers and engineers to present their results in grid and distributed computing. Special areas of interest will be grid middleware, grid applications, grid ...
arXiv (Cornell University), May 16, 2003
A Grid testbed has been established using resources at 12 sites across Canada involving researche... more A Grid testbed has been established using resources at 12 sites across Canada involving researchers from particle physics as well as other fields of science. We describe our use of the testbed with the BaBar Monte Carlo production and the ATLAS data challenge software. In each case the remote sites have no application-specific software stored locally and instead access the software and data via AFS and/or GridFTP from servers located in Victoria. In the case of BaBar, an Objectivity database server was used for data storage. We present the results of a series of initial tests of the Grid testbed using both BaBar and ATLAS applications. The initial results demonstrate the feasibility of using generic Grid resources for HEP applications.
Fluids Engineering, 1998
We present a parallel preconditioned-GMRES algorithm for solving second-order elliptic PDEs defin... more We present a parallel preconditioned-GMRES algorithm for solving second-order elliptic PDEs defined on rectangular domains with Dirichlet and Neumann boundary conditions, and discretized with piecewise Hermite bicubics. The parallel performance of the algorithm is assessed by way of numerical experiments.
The GridX1 computational Grid: from a set of service-specific protocols to a
A large number of Grids have been developed worldwide. Despite being mostly based on the same und... more A large number of Grids have been developed worldwide. Despite being mostly based on the same underlying middleware, the Globus Toolkit, they are generally not inter-operable for a variety of reasons. We present a method of federating those disparate grids which are based on the Globus Toolkit, together with a concrete example of interfacing the LHC Computing Grid (LCG) with HEPGrid. HEPGrid consists of shared resources, at several Canadian research institutes, which are exposed via Globus gatekeepers, and makes use of Condor-G for resource advertisement, matchmaking and job submission. An LCG Computing Element (CE) based at the TRIUMF Laboratory hosts a HEPGrid User Interface (UI) that is contained within a custom JobManager. This JobManager appears in the LCG information system as a normal CE publishing an aggregation of the HEPGrid resources. The interface interprets the incoming job in terms of HEPGrid UI usage, submits it onto HEPGrid, and implements the JobManager 'poll...
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011
Journal of Physics: Conference Series, 2008
The present paper highlights the approach used to design and implement a web services based BaBar... more The present paper highlights the approach used to design and implement a web services based BaBar Monte Carlo (MC) production grid using Globus Toolkit version 4. The grid integrates the resources of two clusters at the University of Victoria, using the ClassAd mechanism provided by the Condor-G metascheduler. Each cluster uses the Portable Batch System (PBS) as its local resource
An instance of the vector sorting problem is a sequence of k-dimensional vectors of length n. A s... more An instance of the vector sorting problem is a sequence of k-dimensional vectors of length n. A solution to the problem is a permutation of the vectors such that in each dimension the length of the longest decreasing subsequence is O(sqrt(n)). A random permutation solves the problem. Here we derandomize the obvious probabilistic algorithm and obtain a deterministic O(kn^3.5) time algorithm that solves the vector sorting problem. We also apply the algorithm to a book embedding problem.
Lecture Notes in Computer Science, 2002
... 1 National Research Council of Canada Institute for Information Technology 1200 Montreal Road... more ... 1 National Research Council of Canada Institute for Information Technology 1200 Montreal Road, Ottawa ON K1A 0R6, Canada julio.valdes@nrc.ca 2 National Research Council of Canada Information Management Services Branch 100 Sussex Drive, Ottawa ON K1A 0R6 ...
Proceedings of the Second International Workshop, Nov 13, 2011
We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We... more We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We develop a matrix transpose algorithm that uses cache blocking, cache prefetching and data alignment. We model the POWER7 data cache and memory concurrency and use the model to predict the memory throughput of the proposed matrix transpose algorithm. The performance of our matrix transpose algorithm is up to five times higher than that of the dgetmo routine of the Engineering and Scientific Subroutine Library and is 2.5 times higher than that of the code generated by compiler-inserted prefetching. Numerical experiments indicate a good agreement between the predicted and the measured memory throughput.
Statistics: A Series of Textbooks and Monographs, 2004
ABSTRACT
Electronic transactions on numerical analysis ETNA
Incomplete factorization preconditioners such as ILU, ILUT and MILU are well-known robust general... more Incomplete factorization preconditioners such as ILU, ILUT and MILU are well-known robust general-purpose techniques for solving linear systems on serial computers. However, they are difficult to parallelize efficiently. Various techniques have been used to parallelize these preconditioners, such as multicolor orderings and subdomain preconditioning. These techniques may degrade the performance and robustness of ILU preconditionings. The purpose of this paper is to perform numerical experiments to compare these techniques in order to assess what are the most effective ways to use ILU preconditioning for practical problems on serial and parallel computers.
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems - PMBS '11, 2011
We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We... more We consider the problem of efficiently computing matrix transposes on the POWER7 architecture. We develop a matrix transpose algorithm that uses cache blocking, cache prefetching and data alignment. We model the POWER7 data cache and memory concurrency and use the model to predict the memory throughput of the proposed matrix transpose algorithm. The performance of our matrix transpose algorithm is up to five times higher than that of the dgetmo routine of the Engineering and Scientific Subroutine Library and is 2.5 times higher than that of the code generated by compiler-inserted prefetching. Numerical experiments indicate a good agreement between the predicted and the measured memory throughput.
21st International Symposium on High Performance Computing Systems and Applications (HPCS'07), 2007
GridX1 is a computational Grid designed and built to link resources at a number of research insti... more GridX1 is a computational Grid designed and built to link resources at a number of research institutions across Canada. Building upon the experience of designing, deploying and operating the first generation of GridX1, we have designed a second-generation, web-services-based, computational Grid. The second generation of GridX1 leverages the Web Services Resource Framework, implemented by the Globus Toolkit version 4. The value added by GridX1 includes metascheduling, file staging, resource registry and resource monitoring.
Web Information Systems and Technologies, 2005
Grid portals typically store user grid credentials in a credential repository. Credential reposit... more Grid portals typically store user grid credentials in a credential repository. Credential repositories allow users to access Grid portals from any machine having a Web browser, but their usage requires several authentication steps. Current portals require users to explicitly go through these steps, thereby hindering their usability. In this paper we present intuitive and easy to use tools to manage certificates. We also describe the integration of Grid Security Infrastructure authentication into a Java-based SSH terminal tool. Based on these tools, we build an innovative portal authentication mechanism that enables transparent delegation of credentials between clients, grid portal and the credential repository.