
30
Chapter 4
Typically,you will use these facilit ies to identify a promising set of attributes in the data. These
attributes can then be fed to the modeling techniques, which will attempt to identify underlying
rules and relationships.
TypicalApplications
Typicalapplica tions of data mining techniques include the following:
Direct mail. Determine which dem ographic groups have the highest response rate. Use this
information to maximize the response to future mailings.
Creditscorin g. Use an individual’s credit history to make credit decision s.
Humanreso urces. Understand past hiring practices and create decision rules to streamline the
hiring process.
Medicalrese arch. Create decision rul es that suggest appropriate procedures based on medical
evidence.
Market analysis. Determine which variables, such as geography, price, and customer
characteristics, are associated with sales.
Qualityc ontrol. Analyze data from prod uct manufacturing and identify variables determining
product defects.
Policystudies. Use survey data to formulate policy by applying decision rules to select the most
important variables.
Healthcare. User surveys and clinical data can be combined to discover var iables that contribute
to health.
Terminology
The terms attribute,field, and variable refer to a single data item common to all cas es under
consideration. A collection of attribute values that refers to a specific case is called a record, an
example, or a case.
Assessing the DataData mining is not likely to be fruitful unless the data you want to use meets certain criteria. The
following sections present some of the aspects of the data and its application that you should
consider.
Ensure that the data is available
This may seem obvious, but be aware that although data might be available, it may not be i n a
form that can be used easily. IBM® SPSS® Modeler can import data from data bases (through
ODBC) or from files. The data, however, might be held in some other form on a machine that
cannot be directly accessed. It will need to be downloaded or dumped i n a suitable form before it
can be used. It might be scattered among different databases and source sa ndn eed to be pulled