Towards an accurate performance modeling of parallel sparse factorization (original) (raw)

Abstract

We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based, high-end parallel architectures. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, that uses static pivoting. Our model characterizes the algorithmic behavior by taking into account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the implementation in the SuperLU_DIST linear system solver, the sparse matrices from real application, and an IBM POWER3 parallel machine. Our modeling methodology can be adapted to study performance of other types of sparse factorizations, such as Cholesky or QR, and on different parallel machines.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal R.C., Gustavson F.G. and Zubair M. (1994). Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop. 38(5): 563–576
    Article Google Scholar
  2. Andersson, S., Bell, R., Hague, J., Holthoff, H., Mayes, P., Nakano, J., Shieh, D., Tuccillo, J.: RS/6000 Scientific and Technical Computing: POWER3 Introduction and Tuning Guide. International Business Machines (1998) http://www.redbooks.ibm.com.
  3. Ashcraft C. (1994). The fan-both family of column-based distributed Cholesky factorization algorithms. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 159–191. Springer, Berlin
    Google Scholar
  4. Browne S., Dongarra J., Garner N., Ho G. and Mucci P. (2000). A portable programming interface for performance evaluation on modern processors. Int. J. High Perfor. Comput. Appl. 14(3): 189–204
    Article Google Scholar
  5. Davis, T.: University of Florida Sparse Matrix Collection. NA Digest, vol. 92, no. 42, October 16, 1994, NA Digest, vol. 96, no. 28, July 23, 1996, and NA Digest, vol. 97, no. 23, June 7, 1997 http://www.cise.ufl.edu/research/sparse/matrices
  6. Grigori, L., Li, X.S.: Performance analysis of parallel right-looking sparse LU factorization on two-dimensional grid of processors. In: Proceedings of PARA’04 Workshop on State-of-the-art in Scientific Computing, LNCS 3732, pp. 768–777 (2006)
  7. Gupta A., Karypis G. and Kumar V. (1997). Highly Scalable Parallel Algorithms for Sparse Matrix Factorization. IEEE Trans. Parallel Distrib. Syst. 8(5): 502–520
    Article Google Scholar
  8. Kålgström B., Ling P. and Van Loan C. (1998). GEMM-Based Level 3 BLAS: model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24(3): 268–302
    Article Google Scholar
  9. Kålgström B., Ling P. and Van Loan C. (1998). GEMM-based level 3 BLAS: portability and optimization issues. ACM Trans. Math. Softw. 24(3): 303–316
    Article Google Scholar
  10. Li X.S. and Demmel J.W. (2003). SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Softw. 29(2): 110–140
    Article MATH MathSciNet Google Scholar
  11. Schreiber R. (1994). Scalability of sparse direct solvers. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 191–211. Springer, Berlin
    Google Scholar
  12. Skinner, D.: IBM SP Parallel Scaling Overview. http://www.nersc.gov/news/reports/technical/seaborg\_scaling
  13. Vuduc, R., Kamil, S., Hsu, J., Nishtala, R., Demmel, J.W., Yellick, K.A.: Automatic tuning and analysis of sparse triangular solve. In: ICS 2002: Workshop on Performance via High-Level Languages and Libraries (2002)

Download references

Author information

Authors and Affiliations

  1. INRIA Futurs, Parc Club Orsay Universite, 91893, Orsay, France
    Laura Grigori
  2. Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA, 94720, USA
    Xiaoye S. Li

Authors

  1. Laura Grigori
  2. Xiaoye S. Li

Corresponding author

Correspondence toXiaoye S. Li.

Rights and permissions

About this article

Cite this article

Grigori, L., Li, X.S. Towards an accurate performance modeling of parallel sparse factorization.AAECC 18, 241–261 (2007). https://doi.org/10.1007/s00200-007-0036-y

Download citation

Keywords