The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers (original) (raw)

Abstract

This paper describes the design, implementation, and evaluation of a parallel algorithm for the Cholesky factorization of banded matrices. The algorithm is part of IBM's Parallel Engineering and Scientific Subroutine Library version 1.2 and is compatible with ScaLAPACK's banded solver. Analysis, as well as experiments on an IBM SP2 distributed-memory parallel computer, show that the algorithm efficiently factors banded matrices with wide bandwidth. For example, a 31-node SP2 factors a large matrix more than 16 times faster than a single node would factor it using the best sequential algorithm, and more than 20 times faster than a single node would using LAPACK's DPBTRF. The algorithm uses novel ideas in the area of distributed dense matrix computations that include the use of a dynamic schedule for a blocked systolic-like algorithm and the separation of the input and output data layouts from the layout the algorithm uses internally. The algorithm also uses known techniques such as blocking to improve its communication-to-computation ratio and its data-cache behavior.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ramesh Agarwal, Fred Gustavson, Mahesh Joshi, and Mohammad Zubair. A scalable parallel block algorithm for band Cholesky factorization. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, pages 430–435, San Francisco, February 1995.
    Google Scholar
  2. T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir. SP2 system architecture. IBM Systems Journal, 34(2):152–184, 1995.
    Google Scholar
  3. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, PA, 2nd edition, 1994. Also available online from http://www.netlib.org.
    Google Scholar
  4. Anonymous. ScaLAPACK's user guide. Technical report, University of Tennessee, 1996. Draft.
    Google Scholar
  5. J. Choi, J. Dongarra, R. Pozo, and D. Walker. ScaLAPACK: A scalable linear algebra for distributed memory concurrent computers. In Proceedings of the 4th Symposium on the Frontiers of Massively Parallel Computation, pages 120–127, 1992. Also available as University of Tennessee Technical Report CS-92-181.
    Google Scholar
  6. Jack J. Dongarra, Jeremy Du Cruz, Sven Hammarling, and Ian Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.
    Google Scholar
  7. Anshul Gupta, Fred G. Gustavson, Mahesh Joshi, and Sivan Toledo. The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers. Technical Report RC20481, IBM T.J. Watson Research Center, Yorktown Heights, NY, June 1996. Available online from from the IBM Research CyberJournal at http://www. watson.ibm.com:8080.
    Google Scholar
  8. IBM Corporation. Engineering and Scientific Subroutine Library, Version 2 Release 2: Guide and Reference, 2nd edition, 1994. Publication number SC23-0526-01.
    Google Scholar
  9. Sivan Toledo and Fred G. Gustavson. The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations. In Proceedings of the 4th Annual Workshop on I/O in Parallel and Distributed Systems, pages 28–40, Philadelphia, May 1996.
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. IBM T.J. Watson Research Center, P.O. Box 218, 10598, Yorktown Heights, NY, USA
    Anshul Gupta, Fred G. Gustavson & Sivan Toledo
  2. Department of Computer Science, University of Minnesota, 200 Union Street SE, 55455, Minneapolis, MN
    Mahesh Joshi

Authors

  1. Anshul Gupta
  2. Fred G. Gustavson
  3. Mahesh Joshi
  4. Sivan Toledo

Editor information

Jerzy Waśniewski Jack Dongarra Kaj Madsen Dorte Olesen

Rights and permissions

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, A., Gustavson, F.G., Joshi, M., Toledo, S. (1996). The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8\_35

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us