IBM 15 - page 48

Chapter 4

preconditions. Apriori requires that input and output ﬁelds all be categorical but

delivers better performance because it is optimized for this type of data.

The CARMA model extracts a set of rules from the data without requiring you to

specify input or target ﬁelds. In contrast to Apriori the CARMA node o ffers build

settings for rule support (support for both antecedent and consequent) rather than just

antecedentsupport. Thismeans that the rules generated can be use dfo ra wi der variety

of applications—for example, to ﬁnd a list of products or services (antecedents)

whose consequent is the item that you want to promote this holiday season.

TheSe quence node discovers association rules in sequential or time-oriented data. A

sequenceis a lis t of item sets that tends to occur in a predictable order. For example, a

customer who purchases a razor and aftershave lotion may purchase shaving cream

thenext timehe shops. TheSequence node i sbas edo nt heC ARMA association rules

algorithm, which uses an efﬁcient two-pass method for ﬁnding sequences.

Segmentation Models

Segmentation models divide the data into segments, or clusters, of records that have s imilar

patterns of input ﬁelds. As they are only interested in t he input ﬁelds, segmentation models have

no concept of output or target ﬁelds. Examples of segment ation models are Kohonen networks,

K-Means clustering, two-step clustering and anomaly detection.

Segmentation models (also known as “clustering models”) are useful in cases where the speciﬁc

result is unknown (for example, when identifying new patterns of fraud, or when identifying

groups of interest in your customer base). Cl ustering models focus on identifying groups of

similar records and labeling the records according to the group to which they belong. This is

done without the beneﬁt of prior knowledge about the groups and their characteristics, and it

distinguishes clustering models from the other modeling techniques in that there is no predeﬁned

output or target ﬁeld for the model to predict. There are no right or wrong a nswers for these

models. Their value is determined by their ability to capture interest ing groupings in the data and

provide useful descriptions of those groupings. C lustering models are often used to create clusters

or segments that are then used as inputs in subsequent analyses (for example, by segmenting

potential customers into homogeneous subgroups).