Towards an accurate performance modeling of parallel sparse factorization (original) (raw)

Abstract

We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based, high-end parallel architectures. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, that uses static pivoting. Our model characterizes the algorithmic behavior by taking into account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the implementation in the SuperLU_DIST linear system solver, the sparse matrices from real application, and an IBM POWER3 parallel machine. Our modeling methodology can be adapted to study performance of other types of sparse factorizations, such as Cholesky or QR, and on different parallel machines.

Access this article

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Agarwal R.C., Gustavson F.G. and Zubair M. (1994). Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop. 38(5): 563–576
Article Google Scholar
Andersson, S., Bell, R., Hague, J., Holthoff, H., Mayes, P., Nakano, J., Shieh, D., Tuccillo, J.: RS/6000 Scientific and Technical Computing: POWER3 Introduction and Tuning Guide. International Business Machines (1998) http://www.redbooks.ibm.com.
Ashcraft C. (1994). The fan-both family of column-based distributed Cholesky factorization algorithms. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 159–191. Springer, Berlin
Google Scholar
Browne S., Dongarra J., Garner N., Ho G. and Mucci P. (2000). A portable programming interface for performance evaluation on modern processors. Int. J. High Perfor. Comput. Appl. 14(3): 189–204
Article Google Scholar
Davis, T.: University of Florida Sparse Matrix Collection. NA Digest, vol. 92, no. 42, October 16, 1994, NA Digest, vol. 96, no. 28, July 23, 1996, and NA Digest, vol. 97, no. 23, June 7, 1997 http://www.cise.ufl.edu/research/sparse/matrices
Grigori, L., Li, X.S.: Performance analysis of parallel right-looking sparse LU factorization on two-dimensional grid of processors. In: Proceedings of PARA’04 Workshop on State-of-the-art in Scientific Computing, LNCS 3732, pp. 768–777 (2006)
Gupta A., Karypis G. and Kumar V. (1997). Highly Scalable Parallel Algorithms for Sparse Matrix Factorization. IEEE Trans. Parallel Distrib. Syst. 8(5): 502–520
Article Google Scholar
Kålgström B., Ling P. and Van Loan C. (1998). GEMM-Based Level 3 BLAS: model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24(3): 268–302
Article Google Scholar
Kålgström B., Ling P. and Van Loan C. (1998). GEMM-based level 3 BLAS: portability and optimization issues. ACM Trans. Math. Softw. 24(3): 303–316
Article Google Scholar
Li X.S. and Demmel J.W. (2003). SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Softw. 29(2): 110–140
Article MATH MathSciNet Google Scholar
Schreiber R. (1994). Scalability of sparse direct solvers. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 191–211. Springer, Berlin
Google Scholar
Skinner, D.: IBM SP Parallel Scaling Overview. http://www.nersc.gov/news/reports/technical/seaborg\_scaling
Vuduc, R., Kamil, S., Hsu, J., Nishtala, R., Demmel, J.W., Yellick, K.A.: Automatic tuning and analysis of sparse triangular solve. In: ICS 2002: Workshop on Performance via High-Level Languages and Libraries (2002)

Download references

Author information

Authors and Affiliations

INRIA Futurs, Parc Club Orsay Universite, 91893, Orsay, France
Laura Grigori
Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA, 94720, USA
Xiaoye S. Li

Authors

Laura Grigori
Xiaoye S. Li

Corresponding author

Correspondence toXiaoye S. Li.

Rights and permissions

About this article

Cite this article

Grigori, L., Li, X.S. Towards an accurate performance modeling of parallel sparse factorization.AAECC 18, 241–261 (2007). https://doi.org/10.1007/s00200-007-0036-y

Download citation

Received: 26 May 2006
Revised: 08 November 2006
Published: 24 February 2007
Issue date: May 2007
DOI: https://doi.org/10.1007/s00200-007-0036-y

Towards an accurate performance modeling of parallel sparse factorization (original) (raw)

Abstract

Access this article

Buy Now

Similar content being viewed by others

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Towards an accurate performance modeling of parallel sparse factorization (original) (raw)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords