National Instruments NI MATRIX Optimizing with Callout Blocks, Optimizing with Inverse Blocks, Optimizing with Division Blocks

Chapter 7 Code Optimization

constant block is optimized away, including the output variable. Also notice

that the existing Constant Propagation optimization can be used with the

constant block, but will only operate on scalar pieces of the constant block

output.

Optimizing with Callout Blocks

The MatrixInverse, MatrixRightDivide, and MatrixLeftDivide blocks are

implemented with callouts and therefore carry a special set of rules that

must be followed to generate optimal code. The most important rule,

applicable to all three blocks, is that the block output should be labeled as

a single matrix or vector (or generate code using maximal vectorization).

This is because the algorithms associated with these blocks are stand-alone

entities with fixed interfaces; a single array is used for the block output for

the MatrixInverse, MatrixRightDivide, and MatrixLeftDivide algorithms.

When the output of a callout block in the generated code is spread among

several different variables, a copy-back must be emitted after the callout to

ensure the results are correctly stored in the desired output variables.

Following this rule can have a significant impact on code generation for

these three blocks. However, regarding input rules, the callout interface

introduces certain constraints on inputs if optimality is desired.

Optimizing with Inverse Blocks

The MatrixInverse block callout has a single argument which is both the

input and output to the inversion algorithm. Thus, the input is modified by

the algorithm. For this reason, a copy-in must always be done for this block

and the input connectivity is not nearly as important as the output

connectivity. You may see tighter code with good input connectivity,

because the copy-in will be looped rather than unrolled, but the copy-in will

still be present.

Optimizing with Division Blocks

The two division blocks: MatrixRightDivide and MatrixLeftDivide, solve

the equations XA = B and AX = B, respectively. When you consider which

to use for your application, notice that you will get more efficient code by

using the MatrixRightDivide rather than the MatrixLeftDivide. This is

because AutoCode generates output matrices in row-major order, whereas

the LINPACK callouts are written expecting column-major inputs.

Solving AX = B under such mismatched conditions requires an extra

transpose-copy not required to solve XA = B.