IBM 15 - page 72

Chapter 5

reduce network trafﬁc and speed stream operations. Note that the Generate S QL check box

must be selected for SQL optimization to have any effect.



Optimize syntax execution. Thi sm ethod of stream rewriting increases the efﬁciency of

operations that incorporate more than one node containing IBM® SPSS® Statistics syntax.

Optimization is achieved by combining the syntax commands into a single operation, instead

of running each as a separate operation.



Optimize other execution. This method of stream rewriting increases the efﬁciency of

operations that cannot be delegated to the database. Optimization is achieve d by reducing the

amount of data in the stream as early as possible. While maintaining da ta integrity,the stream

is rewritten to push operations closer to the data source, thus reducing data downstream for

costly operations, such as joins.

Enablepa rallel processing. When running on a comput erw ith multiple processors, this option

allows the system to balance the load across those processors, which may result in fa ster

performance. Use of multiple nodes or u se of the following individual nodes may beneﬁt from

parallel processing: C5.0, Merge (by key), Sort, Bin (ran k and tile methods), and Aggregate

(using one or more key ﬁelds).

GenerateSQL. Select this optio n to enable SQL generation, allowing stream operations to be pushed

back to the database by using SQL code to generate execution processes, which may improve

performance. To further improve performance,

Optimize SQL generation can also be selected to

maximize the number of operations pushed back to the database. When operations for a node hav e

been pushed back to the database, the node will be highlighted in purple when the stream is run.



Databasecach ing. For streams that generate SQL t o be executed in the database, data can be

cached midstream to a temporary table in the database rather than to the ﬁle system. When

combined with SQL optimization, this may result in signiﬁcant gains in performance. For

example, the output from a stream that merges multiple tables to create a data mining view

may be cached and reused as needed. With database caching enabled ,s imply right-click any

nonterminal node to cache data at that point, and the cache is automatically created directly in

the database the next time the stream is run. This allows SQL to be generate dfo r downstream

nodes, further improving performance. Alterna tively,this option ca nb e disabled if needed,

such as when policies or permissions preclude data being written to the database. If database

caching or SQL optimization is not enabled, the cache will be written to the ﬁle system

instead. For more information, see the topic Caching Options for Nodes on p. 50.



Use relaxed conversion. This option enables the conversion of data from either strings to

numbers, or numbers to strings, if stored in a suitable format. For example, if the data is

kept in the database as a string, but actually contains a meaningful number, the data can be

converted for use when the pushback occurs.

Note: Due to minor differences in SQL implementatio n, streams run in a database may return

slightly different results from those returned when run in SPSS Mod eler. For similar reasons, these

differences may also vary depending on the database vendor.

SaveAs Defa ult. The options speciﬁed appl y only to the current stream. Click this button to set

these options as the default for all streams.