Sca/LAPACK Program Style (original) (raw)

Routine Naming

The LAPACK project intends to maintain backwards compatibility. New routines get new names. Fixes to old routines that preserve the interface can keep the names. If the only change to the interface is to decrease the amount of workspace needed, the old routine name can be kept.

We are no longer limited to 6 characters for routine names. Underscores are ok. At present, this causes problems in a variety of codes that assume 6 character names when they parse input springs (e.g. ILAENV, XERBLA, xOPLA, etc.). We are working to address these issues (see [maintenance]).

Internal Design and Name Usage

Recursion is permitted. But beware of excessive stack usage; it should not be used as a substitute for workspace allocation as discussed in[workspace]. Roughly speaking, as long as the stack only grows to O(log n) then there should be no problem. If the alternative is a non-recursive routine that uses the same space, and if the recursive routine is more readable, prefer the recursive version.

We will use the BLAS and extended BLAS names from BLAST Forum in new code. There is a new BLAS standard [blast], but a partial reference implementation exists only for the extended (and sparse) BLAS. We would like new LAPACK routines to use the new interface and provide a reference implementation that simply calls the appropriate old BLAS routine. Over time this will encourage developers and vendors to migrate to the new BLAS. We will provide an extended BLAS release as part of LAPACK, since it is needed for the new iterative refinement codes.

We prohibit the use of I/O within LAPACK computational routines, with a natural exception for parallel communication. Policies regarding any future out-of-core routines will be considered along with the routines. There should be no terminal or user I/O outside the default XERBLA ([err-handling]).

Error Handling and the Diagnostic Argument INFO

All documented routines have a diagnostic argument INFO that indicates the success or failure of the computation, as follows:

All driver and auxiliary routines check that input arguments such as N or LDA or option arguments of type character have permitted values. If an illegal value of the ith argument is detected, the routine sets INFO = -i, and then calls an error-handling routine XERBLA.

The standard version of XERBLA issues an error message and halts execution, so that no LAPACK routine would ever return to the calling program with INFO < 0. However, this might occur if a non-standard version of XERBLA is used.

Determining Machine Arithmetic Parameters

The xLAMCH interfaces for returning machine arithmetic parameters will be maintained and will be made thread-safe. New code must use xLAMCH.

Many Fortran language queries provide equivalent information. For reference, the table below summarizes the widely available Fortran 95 query functions that are equivalent to xLAMCH calls.

Table------------------`-----------------`------------------

xLAMCH parameter F95 intrinsic Description
E EPSILON Epsilon
B RADIX Base
N DIGITS Significand digits
M MINEXPONENT Min. exponent
U TINY Underflow threshold
L MAXEXPONENT Max. exponent
O HUGE Overflow threshold

The two queries "P" and "S" can be derived from F95 intrinsics as follows (example are given for double precision quantities):

  EPSILON(0.0d0) * RADIX(0.0d0)
  SFMIN = TINY(0.0d0)
  SMALL = 1.0d0/HUGE(0.0d0)
  if (SMALL >= SFMIN) SFMIN = SMALL * (1+EPSILON(0.0d0))

xLAMCH("R") has no corresponding query function in Fortran 95 but grep reveals no use of this query in LAPACK. Likewise, SLAMC1’s test for round-to-nearest and SLAMC2’s test for compliance to IEEE-754 have no corresponding queries in Fortran 95. Fortran 2003 provides these and additional useful predicates (e.g. IEEE_IS_NAN) when an implementation supports IEEE-754 arithmetic, but these compilers are not yet widely available.

C89 provides similar support in float.h. C99 includes greatly improved support for IEEE-754 arithmetic in float.h and math.h. This support is in widely available C compilers, but some system libraries do not fully support fused multiply-add and flag tests.

Determining the Block Size for Block Algorithms

LAPACK routines that implement block algorithms need to determine what block size to use. The intention behind the design of LAPACK is that the choice of block size should be hidden from users as much as possible, but at the same time easily accessible to installers of the package when tuning LAPACK for a particular machine.

LAPACK routines call an auxiliary enquiry function ILAENV, which returns the optimal block size to be used as well as other parameters. The version of ILAENV supplied with the package contains default values that led to good behavior over a reasonable number of our test machines, but to achieve optimal performance, it may be beneficial to tune ILAENV for your particular machine environment. Ideally a distinct implementation of ILAENV is needed for each machine environment (see alsoChapter 6 of the LUG). The optimal block size also may depend on

We ultimately plan to determine these values by an automatic tuning process akin to the one used in ATLAS and related packages.

If ILAENV returns a block size of 1, then the routine performs the unblocked algorithm, calling Level 2 BLAS, and makes no calls to Level 3 BLAS.

Some LAPACK routines require a work array whose size is proportional to the block size (seesubsection 5.1.7 of the LUG). The actual length of the work array is supplied as an argument LWORK. The description of the arguments WORK and LWORK typically goes as follows:

WORK
    (workspace) REAL array, dimension (LWORK)
    On exit, if INFO = 0, then WORK(1) returns the optimal
    LWORK.
LWORK
    (input) INTEGER
    The dimension of the array WORK. LWORK >= max(1,N).

    For optimal performance LWORK >= N*NB, where NB is the
    optimal block size returned by ILAENV.  The routine
    determines the block size to be used by the following
    steps: ...

For such a routine, if LWORK >= max(1,N),

The minimum value of LWORK that would be needed to use the optimal block size is returned in WORK(1) (see [workspace]).

Thus, the routine uses the largest block size allowed by the amount of workspace supplied, as long as this is likely to give better performance than the unblocked algorithm. WORK(1) is not always a simple formula in terms of N and NB.

The specification of LWORK gives the minimum value for the routine to return correct results. If the supplied value is less than the minimum — indicating that there is insufficient workspace to perform the unblocked algorithm — the value of LWORK is regarded as an illegal value, and is treated like any other illegal argument value (seesubsection 5.1.9 of the LUG).

If in doubt about how much workspace to supply, users must use the query mechanism described in [workspace]. That section is still in flux. There is no mechanism that will work for the entire library at this point, but that is being addressed. The query mechanism will be available for all routines that use workspace.