37
Understanding Data Mining
The Self-Learning Response Model (SLRM) node enables you to build a model in
which a single new case, or small number of new cases, can be used to reestimate the
model without having to retrain the model using all data.
The TimeSer ies node estimates exponential smoothing, univariate Autoregressive
IntegratedMoving Average (ARIMA), and multivariate ARIMA (or transfer function)
models for time series data and produces forecasts of future performance. A Time
Series node must always be preceded by a Time Intervals node.
Thek-Near estN eighbor (KNN) node associates a new case with the category or value
ofthe kobjects n earest to it in the predictor space, where kis an integer. Similar cases
are near each other and dissimilar cases are distant from each other.
Association Models
Association models nd patterns in your data where one or more entities (such as events,
purchases, or attributes) are associated with one or more other entities. The models construct rule
sets that dene these relationships. Here the eld sw ithin the data can act as both inputs and
targets. You could nd these associations manually, but association rule algorithms do so much
more quickly,and can explore more complex patterns. Ap riori and Carma models are examples of
the use of such algorithms. One other type of association mode lis a se quence detection model,
which nds sequential patterns in time-structured data.
Association models are most useful when predicting multiple outcomes—for example, customers
who bought product X also bought Y and Z. Association models associate a particular conclusion
(such as the decision to buy something) with a set of conditions. The advantage of ass ociation rule
algorithms over the more standard decision tree algorithms (C5.0 and C&RT) is that associations
can exist between any of the attributes. A decision tree algorithm will build rules wit hon ly a
single conclusion, whereas association algorithms attempt to nd many rules, each of which may
have a different conclusion.
Association nodes
The Apriori node extracts a set of rules from the data, pulling out the rules with
the highest information content. Apriori offers ve different methods of selecting
rules and uses a sophisticated indexing scheme to process larged ata setsefciently.
For large problems, Apriori is generally faster to train; it has no arbitrary limit on
the number of rules that can be retained, and it can handle rules with up to 32