62
Chapter 5
reduce network trafc and speed stream operations. Note that the Generate S QL check box
must be selected for SQL optimization to have any effect.
Optimize syntax execution. Thi sm ethod of stream rewriting increases the efciency of
operations that incorporate more than one node containing IBM® SPSS® Statistics syntax.
Optimization is achieved by combining the syntax commands into a single operation, instead
of running each as a separate operation.
Optimize other execution. This method of stream rewriting increases the efciency of
operations that cannot be delegated to the database. Optimization is achieve d by reducing the
amount of data in the stream as early as possible. While maintaining da ta integrity,the stream
is rewritten to push operations closer to the data source, thus reducing data downstream for
costly operations, such as joins.
Enablepa rallel processing. When running on a comput erw ith multiple processors, this option
allows the system to balance the load across those processors, which may result in fa ster
performance. Use of multiple nodes or u se of the following individual nodes may benet from
parallel processing: C5.0, Merge (by key), Sort, Bin (ran k and tile methods), and Aggregate
(using one or more key elds).
GenerateSQL. Select this optio n to enable SQL generation, allowing stream operations to be pushed
back to the database by using SQL code to generate execution processes, which may improve
performance. To further improve performance,
Optimize SQL generation can also be selected to
maximize the number of operations pushed back to the database. When operations for a node hav e
been pushed back to the database, the node will be highlighted in purple when the stream is run.
Databasecach ing. For streams that generate SQL t o be executed in the database, data can be
cached midstream to a temporary table in the database rather than to the le system. When
combined with SQL optimization, this may result in signicant gains in performance. For
example, the output from a stream that merges multiple tables to create a data mining view
may be cached and reused as needed. With database caching enabled ,s imply right-click any
nonterminal node to cache data at that point, and the cache is automatically created directly in
the database the next time the stream is run. This allows SQL to be generate dfo r downstream
nodes, further improving performance. Alterna tively,this option ca nb e disabled if needed,
such as when policies or permissions preclude data being written to the database. If database
caching or SQL optimization is not enabled, the cache will be written to the le system
instead. For more information, see the topic Caching Options for Nodes on p. 50.
Use relaxed conversion. This option enables the conversion of data from either strings to
numbers, or numbers to strings, if stored in a suitable format. For example, if the data is
kept in the database as a string, but actually contains a meaningful number, the data can be
converted for use when the pushback occurs.
Note: Due to minor differences in SQL implementatio n, streams run in a database may return
slightly different results from those returned when run in SPSS Mod eler. For similar reasons, these
differences may also vary depending on the database vendor.
SaveAs Defa ult. The options specied appl y only to the current stream. Click this button to set
these options as the default for all streams.