Improved Symbolic and Numerical Factorization Algorithms for Unsymmetric Sparse Matrices (original) (raw)

A novel parallel algorithm for Gaussian Elimination of sparse unsymmetric matrices

We describe a new algorithm for Gaussian Elimination suitable for general (unsymmetric and possibly singular) sparse matrices of any entry type, which has a natural parallel and distributed-memory formulation but degrades gracefully to sequential execution. We present a sample MPI implementation of a program computing the rank of a sparse integer matrix using the proposed algorithm. Some preliminary performance measurements are presented and discussed, and the performance of the algorithm is compared to corresponding state-of-the-art algorithms for floating-point and integer matrices.

Gaussian elimination as a computational paradigm

2003

An abstract view of symmetric gaussian elimination is presented. Problems are viewed as an assembly of computational entities whose interdependence is modeled by a graph. An algorithmic transformation on the entities which can be associated withvertex removal, is assumed to exist. The elimination tree of the symmetric gaussian elimination figures the order in which these transformations are applied and captures any potential parallelism. The inherently sequential part of the computational effort depends on the height of the tree. The paradigm is illustrated by block structured LP problems with nested decomposition and basis factorization approaches, problems of blocked symmetric and unsymmetric systems of linear equations, with espectively blocked Cholesky factorization and blocked gaussian elimination. Contributions are: demonstration of the paradigm expressive power through graph concepts (eliminations sets, elimination chains, etc.); emphasis on patterns of similarity in the use ...

A parallel hybrid sparse linear system solver

2009 Computational Electromagnetics International Workshop, 2009

Abstraet-42onsider the system Ax = b, where A is a large sparse nonsymmetric matrix. It is assumed that A has no sparsity structure that may be exploited in the solution process, its spectrum may lie on both sides of the imaginary axis and its symmetric part may be indefinite. For such systems direct methods may be both time consuming and storage demanding, while iterative methods may not converge. In this paper, a hybrid method, which attempts to avoid these drawbacks, is proposed. An L U factorization of A that depends on a strategy that drops small non-zero elements during the Gaussian elimination process is used as a preconditioner for conjugate gradient-like schemes, ORTHOMIN, GMRES and CGS. Robustness is achieved by altering the drop tolerance and recomputing the preconditioner in the event that the factorization or the iterative method fails. If after a prescribed number of trials the iterative method is still not eonvergent, then a switch is made to a direct solver. Numerical examples, using matrices from the Harwell-Boeing test matrices, show that this hybrid scheme is often less time consuming and storage demanding; than direct solvers, and more robust than iterative methods that depend on preconditioners that depend .an classical positional dropping strategies. I, THE HYBRID ALGORITHM Consider the system of linear algebraic equations Ax = b, where A is a nonsingular, large, sparse and nonsymmetric matrix. We assume also that matrix A is generally sparse (i.e. it has neither any special property, such as symmetry and/or positive definiteness, nor any special pattern, such as bandedness, that can be exploited in the solution of the system). Solving such linear systems may be a rather difficult task. This is so because commonly used direct methods (sparse Gaussian elimination) are too time consuming, and iterative methods whose success depends on the matrix having a definite symmetric part or depends on the spectrum lying on one side of the imaginary axis are not robust enough. Direct methods have the advantage that they normally produce a sufficiently accurate solution, although a direct estimation of the accuracy actually achieved requires additional work. On the other hand, when iterative methods converge sufficiently fast, they require computing time that is several orders of magnitude smaller than that of any direct method. This brief comparison of the main properties of direct methods and iterative methods for the problem at hand shows that the methods of both groups have some advantages and some disadvantages. Ttlerefore it seems worthwhile to design methods that combine the advantages of both groups, while minimizing their disadvantages.

A Shared- and distributed-memory parallel general sparse direct solver

Applicable Algebra in Engineering, Communication and Computing, 2007

An important recent development in the area of solution of general sparse systems of linear equations has been the introduction of new algorithms that allow complete decoupling of symbolic and numerical phases of sparse Gaussian elimination with partial pivoting. This enables efficient solution of a series of sparse systems with the same nonzero pattern but different coefficient values, which is a fairly common situation in practical applications. This paper reports on a shared-and distributed-memory parallel general sparse solver based on these new symbolic and unsymmetric-pattern multifrontal algorithms.

Parallel Gaussian elimination on an MIMD computer

Parallel Computing, 1988

Al~'aet. This paper introduces a graph-theoretic approach to analyse the performances of several parallel Gaussian-like triangularizafion algorithms on an MIMD computer. We show that the SAXPY, GAXPY and DOT algorithm~ of Dongarra, Gustavson and Karp, as well as parallel versions of the LDM t, LDL t, Doolittle and Cholesky algorithms, can be classified into four task graph models. We derive new complexity results and compare the asymptotic performances of these parallel versions.

A new row ordering strategy for frontal solvers

Numerical Linear Algebra with Applications, 1999

The frontal method is a variant of Gaussian elimination that has been widely used since the mid 1970s. In the innermost loop of the computation the method exploits dense linear algebra kernels, which are straightforward to vectorize and parallelize. This makes the method attractive for modern computer architectures. However, unless the matrix can be ordered so that the front is never very large, frontal methods can require many more oating-point operations for factorization than other approaches. We use the idea of a row graph of an unsymmetric matrix combined with a variant of Sloan's pro le reduction algorithm to reorder the rows. We also look at using the spectral method applied to the row graph. Numerical experiments are performed on a range of practical problems. Our new row ordering algorithm is shown to produce orderings that are a signi cant improvement on those obtained with existing algorithms. Numerical results also compare the performance of the frontal solver MA42 on the reordered matrix with other direct solvers for large sparse unsymmetric linear systems.

Fraction free Gaussian elimination for sparse matrices

1995

A variant of the fraction free form of Gaussian elimination is presented. This algorithm reduces the amount of arithmetic involved when the matrix has many zero entries. The advantage can be great for matrices with symbolic entries (integers, polynomials, expressions in trigonometric functions, etc.). These claims are supported with some analysis and experimental data.

Efficient sparse LU factorization with partial pivoting on distributed memory architectures

IEEE Transactions on Parallel and Distributed Systems, 1998

A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fill-in structures during the elimination process. This paper presents an approach called S* for parallelizing this problem on distributed memory machines. The S* approach adopts static symbolic factorization to avoid run-time control overhead, incorporates 2D L/U supernode partitioning and amalgamation strategies to improve caching performance, and exploits irregular task parallelism embedded in sparse LU using asynchronous computation scheduling. The paper discusses and compares the algorithms using 1D and 2D data mapping schemes, and presents experimental studies on Cray-T3D and T3E. The performance results for a set of nonsymmetric benchmark matrices are very encouraging, and S* has achieved up to 6.878 GFLOPS on 128 T3E nodes. To the best of our knowledge, this is the highest performance ever achieved for this challenging problem and the previous record was 2.583 GFLOPS on shared memory machines .

PSPASES: Building a High Performance Scalable Parallel Direct Solver for Sparse Linear Systems

Parallel Numerical Computation with Applications, 1999

Many problems in engineering and scienti c domains require solving large sparse systems of linear equations, as a computationally intensive step towards the nal solution. It has long been a challenge to develop e cient parallel formulations of sparse direct solvers due to several di erent complex steps involved in the process. In this paper, we describe PSPASES, one of the rst e cient, portable, and robust scalable parallel solvers for sparse symmetric positive de nite linear systems that we have developed. We discuss the algorithmic and implementation issues involved in its development; and present performance and scalability results on Cray T3E and SGI Origin 2000. PSPASES could solve the largest sparse system (1 million equations) ever solved by a direct method, with the highest performance (51 GFLOPS for Cholesky factorization) ever reported.