Steve Karmesin - Academia.edu (original) (raw)
Papers by Steve Karmesin
In the solution of large-scale numerical problems, parallel computing is becoming simultaneously ... more In the solution of large-scale numerical problems, parallel computing is becoming simultaneously more important and more dificult. The complex organization of today’s multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely complex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of managing parallelism and data locality from the user. We present innovative algorithms, based on the macro-dataflow model, for detecting data parallelism and efficiently executing dataparallel statements on shared-memory multiprocessors. We also describe how these algorithms can be implemented on clusters of SMPs.
For external distribution as requested. Los Alamos National Laboratory, an affirmative actiodequa... more For external distribution as requested. Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the US. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for US. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. The Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (10196) DISCLAIMER This report was prepared as an account of work sponsored by an agency of the
Lecture Notes in Computer Science, 2000
Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by t... more Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Aiamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the US. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (10/96) DISCLAIMER This repon Was PEPard as an account of work sponsored by an agency of the Unitcd States Government Neither the United Statcs Government nor any agency thmof. nor =Y of their anpioyecs, mzkcs m y wursnty, exprrs or impiiai. or unrma any liability or rrsponsiiiiity for the 2 c~u a c y , cornpicmess. or wfuiness of a y information, a p p a n t~~, produa, or proccu discimcd. or rrprrsents that its rue would not infringe privatdy owtted rights. ReZutncc herein to any spcciiic commc+ product, proctu, or service by trade name. u?darutk. insnufacturn, or othcmsc does not ntrmarily constitute or imply iu cadonancat. recornmendation. or favoring by the United States Govmment or my agency thereof. The v i m and opinions of authors q m s s a i herein do not n d y nptc or chose of the United States m a t OT m y agency t h m f. DISCLAIMER Portions of this document may be illegible in electronic image products. Images are produced from the best available original document.
POOMA is an object-oriented C++ class library for doing large scale scientific computations. At i... more POOMA is an object-oriented C++ class library for doing large scale scientific computations. At its highest level it provides the user with data-parallel objects for simulating PDE's and containers of particles for kinetic simulations. These objects translate data-parallel statements into local computation, communication, and synchronization for execution on a variety of serial or parallel architectures. This allows development on workstations
Computing in Object-Oriented Parallel Environments, 1998
Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by t... more Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (10/96) DISCLAIMER This repon was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thmof, nor any of their cmploy#s, rnaltes any w-ty, e x p m or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or procey disclosed or rrpresenu that its usc would not infringe privately owned rights. Reference herein to MY spccific commercia! ptodun procc~s, or serviu by vadc name. trademark. manufacturer. or otherwise does not necessarily constitute or imply its endoncment. m mmendation. or favoring by the United States Government or any agency thmof. The views and opinions of authors expressed hmin do not n-Iy state or reflect t h e of the United States Governmeat or any agency thercof. DISCLAIMER Portions of this document may be illegible in electronic image products. Images are produced from the best available original document
AIP Conference Proceedings
A two-dimensional timedependent two-fluid hydrodynamic model has been used to study numerically t... more A two-dimensional timedependent two-fluid hydrodynamic model has been used to study numerically the effect of interstellar neutrals on the size and structure of the heliosphere. The interstellar neutrals, coupled to the plasma by charge-exchange collisions, lead to a dramatic decrease in the size of the heliosphere-30% for the parameters studied. We find that a build up of neutral hydrogen in front of the leading edge of the heliosphere, seen in earlier models, occurs only when the flow in the interstellar medium is supersonic. When the flow is subsonic, no such hydrogen "wall" is seen in the simulations, suggesting that the distribution of scattered solar H Ly a light may be quite different for this case. We have also calculated the propagation of an interplanetary shock to the heliopause as a possible trigger for the 1992 Voyager 2-3 kHz radio emission event. We find that the interstellar plasma density, observed emission cutoff frequency, and heliopause location can all be made consistent once the effect of the reduction in the size of the heliosphere by the interaction with the neutrals is included.
Proceedings of 1994 IEEE 21st International Conference on Plasma Sciences (ICOPS), 1994
ion Layer ffl Local Layer As described earlier, the classes higher in the FrameWork represent abs... more ion Layer ffl Local Layer As described earlier, the classes higher in the FrameWork represent abstractions directly relevant to application domains, whereas classes lower in the FrameWork represent the abstractions of parallelism and efficient computational kernels. The Global and Local Layers work together to define Global Data Types (GDTs) that perform matrix, field, and particle operations. The interactions between the Global and Local classes are mediated by objects from the Parallel Abstraction Layer (PAL), which is responsible for capturing the key abstractions of parallelism, such as interprocessor communication, domain decomposition, and load balancing. The Component Layer, which is built upon the Global Layer, contains a rich set of objects directly relevant to scientific simulation (such as interpolaters, FFTs, and Krylov solvers). Objects in the Component Layer are generic and reusable across problem domains, whereas objects in the Application Layer represent a configurat...
The interaction of a magnetized solar wind with magnetized interstellar flow is modelled in two d... more The interaction of a magnetized solar wind with magnetized interstellar flow is modelled in two dimensions. The VLISM magnetic field and flow velocity are assumed to be parallel to each other and perpendicular to the sun's magnetic axis, contrast to a recent study by Washimi. The more realistic orientation of the axis and VLISM flow requires a less realistic heliospheric field model in two dimensions. The Parker spiral is replaced by a poloidal field which is tangent to the termination shock, and whose magnitude is consistent with estimates of the heliospheric field at termination shock distances. Of particular interest is the effect of the heliospheric field on the heliospheric flow beyond the termination shock, and on the structure and location of the bow shock
35th Aerospace Sciences Meeting and Exhibit, 1997
Computers in Physics, 1996
General strategies arc developed to optimize particle-cell-codes written in ];orlran for ]<]= pro... more General strategies arc developed to optimize particle-cell-codes written in ];orlran for ]<]= processors which are commonly used on massively para]]d computers. 'Ilese strategies include data reorganization to improve cache utili~tation and code reorgani~.ation to improve efficiency of arithmetic pipelines. l<esults show performance improvements of 1.4 to 3.4 timm can be achieved. 1. ln(roctuction. . ,.
Journal of Plasma Physics, 1999
Geophysical Research Letters, 1995
A iwo-dirnenaional hydrodynamic l,OS Alamos numerical model has been used to study the motion of ... more A iwo-dirnenaional hydrodynamic l,OS Alamos numerical model has been used to study the motion of the termination shock in response to an 11 year variation in the solar wind ram pres$ure. We find that fo? a total variation in the ram pressure by a factor of Z , a termination shock at 89 A U moves inward and outward about 3:8'X Oj ita distance u'ith a typical velocity Oj 12 km/~ec. This movement may be under~tood in tervna of the varioua tiTTLe acalea og~ociated with the response of the termination Jhock and heliopause to variations in the solar wind ram pre.uur-e.
Go online to http://acts. …, 2002
IEEE International Conference on Plasma Science
We show results from a code designed to simulate devices such as high power microwave sources wit... more We show results from a code designed to simulate devices such as high power microwave sources with complex three dimensional geometries. The calculations involved require the largest computers available, so the algorithms and code are designed with parallelism in mind. The code runs on the 256 processor Gray T3D parallel supercomputer at JPL, but is designed to be easily portable to other parallel platforms as well. We describe a technique for electromagnetic particle-in-cell (EMPIC) on nonorthogonal meshes in 3 dimensions. The divergence equations and second order convergence for the fields are preserved in this explicit algorithm by locating the quantities on staggered meshes and transforming between fields dotted with face normals and fields dotted with edge vectors. The domain is decomposed into “patches”, each of which contains a grid that is logically cartesian but potentially nonorthogonal. The patches interact only through their boundaries, and may be distributed arbitrarily...
Symposium on Parallel and Distributed Tools, 1995
... Applications using C++ Sameer Shende, Alien D. Malony, Janice Cuny, ... In this paper, we foc... more ... Applications using C++ Sameer Shende, Alien D. Malony, Janice Cuny, ... In this paper, we focus on the profiling and tracing of C++ appli- cations that have been written using a rich parallel programming framework for high- performance, scientific computing. ...
In the solution of large-scale numerical problems, parallel computing is becoming simultaneously ... more In the solution of large-scale numerical problems, parallel computing is becoming simultaneously more important and more dificult. The complex organization of today’s multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely complex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of managing parallelism and data locality from the user. We present innovative algorithms, based on the macro-dataflow model, for detecting data parallelism and efficiently executing dataparallel statements on shared-memory multiprocessors. We also describe how these algorithms can be implemented on clusters of SMPs.
For external distribution as requested. Los Alamos National Laboratory, an affirmative actiodequa... more For external distribution as requested. Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the US. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for US. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. The Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (10196) DISCLAIMER This report was prepared as an account of work sponsored by an agency of the
Lecture Notes in Computer Science, 2000
Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by t... more Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Aiamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the US. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (10/96) DISCLAIMER This repon Was PEPard as an account of work sponsored by an agency of the Unitcd States Government Neither the United Statcs Government nor any agency thmof. nor =Y of their anpioyecs, mzkcs m y wursnty, exprrs or impiiai. or unrma any liability or rrsponsiiiiity for the 2 c~u a c y , cornpicmess. or wfuiness of a y information, a p p a n t~~, produa, or proccu discimcd. or rrprrsents that its rue would not infringe privatdy owtted rights. ReZutncc herein to any spcciiic commc+ product, proctu, or service by trade name. u?darutk. insnufacturn, or othcmsc does not ntrmarily constitute or imply iu cadonancat. recornmendation. or favoring by the United States Govmment or my agency thereof. The v i m and opinions of authors q m s s a i herein do not n d y nptc or chose of the United States m a t OT m y agency t h m f. DISCLAIMER Portions of this document may be illegible in electronic image products. Images are produced from the best available original document.
POOMA is an object-oriented C++ class library for doing large scale scientific computations. At i... more POOMA is an object-oriented C++ class library for doing large scale scientific computations. At its highest level it provides the user with data-parallel objects for simulating PDE's and containers of particles for kinetic simulations. These objects translate data-parallel statements into local computation, communication, and synchronization for execution on a variety of serial or parallel architectures. This allows development on workstations
Computing in Object-Oriented Parallel Environments, 1998
Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by t... more Los Alamos National Laboratory, an affirmative actiodequal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (10/96) DISCLAIMER This repon was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thmof, nor any of their cmploy#s, rnaltes any w-ty, e x p m or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or procey disclosed or rrpresenu that its usc would not infringe privately owned rights. Reference herein to MY spccific commercia! ptodun procc~s, or serviu by vadc name. trademark. manufacturer. or otherwise does not necessarily constitute or imply its endoncment. m mmendation. or favoring by the United States Government or any agency thmof. The views and opinions of authors expressed hmin do not n-Iy state or reflect t h e of the United States Governmeat or any agency thercof. DISCLAIMER Portions of this document may be illegible in electronic image products. Images are produced from the best available original document
AIP Conference Proceedings
A two-dimensional timedependent two-fluid hydrodynamic model has been used to study numerically t... more A two-dimensional timedependent two-fluid hydrodynamic model has been used to study numerically the effect of interstellar neutrals on the size and structure of the heliosphere. The interstellar neutrals, coupled to the plasma by charge-exchange collisions, lead to a dramatic decrease in the size of the heliosphere-30% for the parameters studied. We find that a build up of neutral hydrogen in front of the leading edge of the heliosphere, seen in earlier models, occurs only when the flow in the interstellar medium is supersonic. When the flow is subsonic, no such hydrogen "wall" is seen in the simulations, suggesting that the distribution of scattered solar H Ly a light may be quite different for this case. We have also calculated the propagation of an interplanetary shock to the heliopause as a possible trigger for the 1992 Voyager 2-3 kHz radio emission event. We find that the interstellar plasma density, observed emission cutoff frequency, and heliopause location can all be made consistent once the effect of the reduction in the size of the heliosphere by the interaction with the neutrals is included.
Proceedings of 1994 IEEE 21st International Conference on Plasma Sciences (ICOPS), 1994
ion Layer ffl Local Layer As described earlier, the classes higher in the FrameWork represent abs... more ion Layer ffl Local Layer As described earlier, the classes higher in the FrameWork represent abstractions directly relevant to application domains, whereas classes lower in the FrameWork represent the abstractions of parallelism and efficient computational kernels. The Global and Local Layers work together to define Global Data Types (GDTs) that perform matrix, field, and particle operations. The interactions between the Global and Local classes are mediated by objects from the Parallel Abstraction Layer (PAL), which is responsible for capturing the key abstractions of parallelism, such as interprocessor communication, domain decomposition, and load balancing. The Component Layer, which is built upon the Global Layer, contains a rich set of objects directly relevant to scientific simulation (such as interpolaters, FFTs, and Krylov solvers). Objects in the Component Layer are generic and reusable across problem domains, whereas objects in the Application Layer represent a configurat...
The interaction of a magnetized solar wind with magnetized interstellar flow is modelled in two d... more The interaction of a magnetized solar wind with magnetized interstellar flow is modelled in two dimensions. The VLISM magnetic field and flow velocity are assumed to be parallel to each other and perpendicular to the sun's magnetic axis, contrast to a recent study by Washimi. The more realistic orientation of the axis and VLISM flow requires a less realistic heliospheric field model in two dimensions. The Parker spiral is replaced by a poloidal field which is tangent to the termination shock, and whose magnitude is consistent with estimates of the heliospheric field at termination shock distances. Of particular interest is the effect of the heliospheric field on the heliospheric flow beyond the termination shock, and on the structure and location of the bow shock
35th Aerospace Sciences Meeting and Exhibit, 1997
Computers in Physics, 1996
General strategies arc developed to optimize particle-cell-codes written in ];orlran for ]<]= pro... more General strategies arc developed to optimize particle-cell-codes written in ];orlran for ]<]= processors which are commonly used on massively para]]d computers. 'Ilese strategies include data reorganization to improve cache utili~tation and code reorgani~.ation to improve efficiency of arithmetic pipelines. l<esults show performance improvements of 1.4 to 3.4 timm can be achieved. 1. ln(roctuction. . ,.
Journal of Plasma Physics, 1999
Geophysical Research Letters, 1995
A iwo-dirnenaional hydrodynamic l,OS Alamos numerical model has been used to study the motion of ... more A iwo-dirnenaional hydrodynamic l,OS Alamos numerical model has been used to study the motion of the termination shock in response to an 11 year variation in the solar wind ram pres$ure. We find that fo? a total variation in the ram pressure by a factor of Z , a termination shock at 89 A U moves inward and outward about 3:8'X Oj ita distance u'ith a typical velocity Oj 12 km/~ec. This movement may be under~tood in tervna of the varioua tiTTLe acalea og~ociated with the response of the termination Jhock and heliopause to variations in the solar wind ram pre.uur-e.
Go online to http://acts. …, 2002
IEEE International Conference on Plasma Science
We show results from a code designed to simulate devices such as high power microwave sources wit... more We show results from a code designed to simulate devices such as high power microwave sources with complex three dimensional geometries. The calculations involved require the largest computers available, so the algorithms and code are designed with parallelism in mind. The code runs on the 256 processor Gray T3D parallel supercomputer at JPL, but is designed to be easily portable to other parallel platforms as well. We describe a technique for electromagnetic particle-in-cell (EMPIC) on nonorthogonal meshes in 3 dimensions. The divergence equations and second order convergence for the fields are preserved in this explicit algorithm by locating the quantities on staggered meshes and transforming between fields dotted with face normals and fields dotted with edge vectors. The domain is decomposed into “patches”, each of which contains a grid that is logically cartesian but potentially nonorthogonal. The patches interact only through their boundaries, and may be distributed arbitrarily...
Symposium on Parallel and Distributed Tools, 1995
... Applications using C++ Sameer Shende, Alien D. Malony, Janice Cuny, ... In this paper, we foc... more ... Applications using C++ Sameer Shende, Alien D. Malony, Janice Cuny, ... In this paper, we focus on the profiling and tracing of C++ appli- cations that have been written using a rich parallel programming framework for high- performance, scientific computing. ...