HP UX Performance Tools 258

DGEMMS/ZGEMMS		Strassen matrix-matrix multiply
	beta	The scalar β.
	c	Array containing the m-by-nmatrix C. Not used as
		input if beta = 0.
	ldc	The leading dimension of array c as declared in the
		calling program unit, with ldc ≥ max(m,1).
Output	c	The updated C matrix replaces the input.

Notes Except for the extra character in the subprogram name, these subprograms conform to speciﬁcations of the Level 3 BLAS subprograms DGEMM and

ZGEMM.

Because of their use of Strassen’s method DGEMMS and ZGEMMS are asymptotically faster than standard matrix multiply methods such as those employed in the standard routines DGEMM and ZGEMM. In practice, these particular implementations are faster than their standard counterparts if min(m,n,k) > 700 for ZGEMMS, or min(m,n,k) > 1500 for DGEMMS. The speedup in the complex case is much more pronounced. That is due in large part to the complex bilinear reduction technique (implemented underneath Strassen’s method) that allows two complex matrices to be multiplied using only 3/4 of the multiplications required by the traditional method. Also, the relative cost of data motion is lower in the complex case. The gains in the real case are marginal until n becomes very large.

In the operator norm, Strassen’s method is slightly less stable than traditional matrix multiplication, and the computation of individual elements is unstable. The emerging consensus seems to be that Strassen’s method is sufﬁciently stable for most applications. Partly for stability reasons, however, only 64-bit Strassen subprograms are available at this time.

For a good overview and bibliography of this subject, see Higham.

If an error in the arguments is detected, the subprograms call error handler XERBLA, which writes an error message onto the standard error ﬁle and terminates execution. The standard version of XERBLA (refer to the end of this chapter) can be replaced with a user-supplied version to change the error procedure. Error conditions are:

transa ≠ ’N’ or ’n’ or ’T’ or ’t’ or ’C’ or ’c’ transb ≠ ’N’ or ’n’ or ’T’ or ’t’ or ’C’ or ’c’ m < 0

n < 0 k < 0

lda too small ldb too small ldc < max(m,1)

230HP MLIB User’s Guide