Towards an accurate performance modeling of parallel sparse factorization (original) (raw)
Abstract
We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based, high-end parallel architectures. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, that uses static pivoting. Our model characterizes the algorithmic behavior by taking into account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the implementation in the SuperLU_DIST linear system solver, the sparse matrices from real application, and an IBM POWER3 parallel machine. Our modeling methodology can be adapted to study performance of other types of sparse factorizations, such as Cholesky or QR, and on different parallel machines.
Access this article
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime View plans
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
References
- Agarwal R.C., Gustavson F.G. and Zubair M. (1994). Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop. 38(5): 563–576
Article Google Scholar - Andersson, S., Bell, R., Hague, J., Holthoff, H., Mayes, P., Nakano, J., Shieh, D., Tuccillo, J.: RS/6000 Scientific and Technical Computing: POWER3 Introduction and Tuning Guide. International Business Machines (1998) http://www.redbooks.ibm.com.
- Ashcraft C. (1994). The fan-both family of column-based distributed Cholesky factorization algorithms. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 159–191. Springer, Berlin
Google Scholar - Browne S., Dongarra J., Garner N., Ho G. and Mucci P. (2000). A portable programming interface for performance evaluation on modern processors. Int. J. High Perfor. Comput. Appl. 14(3): 189–204
Article Google Scholar - Davis, T.: University of Florida Sparse Matrix Collection. NA Digest, vol. 92, no. 42, October 16, 1994, NA Digest, vol. 96, no. 28, July 23, 1996, and NA Digest, vol. 97, no. 23, June 7, 1997 http://www.cise.ufl.edu/research/sparse/matrices
- Grigori, L., Li, X.S.: Performance analysis of parallel right-looking sparse LU factorization on two-dimensional grid of processors. In: Proceedings of PARA’04 Workshop on State-of-the-art in Scientific Computing, LNCS 3732, pp. 768–777 (2006)
- Gupta A., Karypis G. and Kumar V. (1997). Highly Scalable Parallel Algorithms for Sparse Matrix Factorization. IEEE Trans. Parallel Distrib. Syst. 8(5): 502–520
Article Google Scholar - Kålgström B., Ling P. and Van Loan C. (1998). GEMM-Based Level 3 BLAS: model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24(3): 268–302
Article Google Scholar - Kålgström B., Ling P. and Van Loan C. (1998). GEMM-based level 3 BLAS: portability and optimization issues. ACM Trans. Math. Softw. 24(3): 303–316
Article Google Scholar - Li X.S. and Demmel J.W. (2003). SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Softw. 29(2): 110–140
Article MATH MathSciNet Google Scholar - Schreiber R. (1994). Scalability of sparse direct solvers. In: George, A., Gilbert, J.R. and Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation, pp 191–211. Springer, Berlin
Google Scholar - Skinner, D.: IBM SP Parallel Scaling Overview. http://www.nersc.gov/news/reports/technical/seaborg\_scaling
- Vuduc, R., Kamil, S., Hsu, J., Nishtala, R., Demmel, J.W., Yellick, K.A.: Automatic tuning and analysis of sparse triangular solve. In: ICS 2002: Workshop on Performance via High-Level Languages and Libraries (2002)
Author information
Authors and Affiliations
- INRIA Futurs, Parc Club Orsay Universite, 91893, Orsay, France
Laura Grigori - Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA, 94720, USA
Xiaoye S. Li
Authors
- Laura Grigori
- Xiaoye S. Li
Corresponding author
Correspondence toXiaoye S. Li.
Rights and permissions
About this article
Cite this article
Grigori, L., Li, X.S. Towards an accurate performance modeling of parallel sparse factorization.AAECC 18, 241–261 (2007). https://doi.org/10.1007/s00200-007-0036-y
- Received: 26 May 2006
- Revised: 08 November 2006
- Published: 24 February 2007
- Issue date: May 2007
- DOI: https://doi.org/10.1007/s00200-007-0036-y