Chapter
1
11
13
33
3
Performance Considerations forStreams and Nodes
Youcan design yo ur streams to maximize performance by arranging the nodes in the most
efcient conguration, by enabling node caches when appropriate, and by paying att ention to
other considerations as detailed in this section.
Aside from the considerations discussed here, additional and more substantia l performance
improvements can typically be gained by making effective use of your database, particularly
through SQL optimization.

Order of Nodes

Even when you are not using SQL optimization, the order of nodes in a stream can affect
performance. The general goal is to m inimize downstream processing; therefore, when you
have nodes that reduce the amount of data, place them near the beginning of the stream. IBM®
SPSS® Modeler Server can apply some reordering rules automatically during compilation to bring
forward certain nodes when it can be proven safe to do so. (Thisfeature is ena bled by default.
Check with your system administrator to make sure it is enabled in your installation.)
When using SQL optimization, you want to maximize its availability and efciency. Since
optimization halts when the stream contains an operation that cannot be performed in the d atabase,
it is best to group SQL-optimized operations together at the beginning of th e stream. This strategy
keeps more of the processing in the database, so less data is carried into IBM® SPSS® Modeler.
The following operations can be done in most databases. Try to group them at the beginning of
the stream:
Merge by key (join)
Select
Aggregate
Sort
Sample
Append
Distinct operations in include mode, in which all elds are selected
Filler operations
Basic derive operations using standard arithmetic or string manipulation (depending on which
operations are supported by the database)
Set-to-ag
© Copyright IBM Corporation 1994, 2012. 230