32
Chapter 4
A Strategy for Data Mining
As with most business endeavors, data mining is much more effective if done in a planned,
systematic way. Even with cutting-edge data mining tool s, such as IBM® SPSS® Modeler, the
majority of the work in data mining requires a knowledgeable business analyst to keep th e process
on track. To guide your planning, answer the following questions:
What substantive problem do you want to solve?
What data sources are available, and what parts of the data are relevant to the current problem?
What kind of preprocessing and data cleaning do you need to do before you start mining
the data?
What data mining technique(s) will you use?
How will you evaluate the results of the data mining analysis?
How will you get the most out of the information you obtained from data mining?
The typical data mining process can become complicated very quickly. There is a lo t to keep track
of—complex business problems, multiple data sources, varying data quality across data sources,
an array of data mining techniques, different ways of measuring data mining success, and so on.
Tostay on track, it hel pst oha vea n explicitly dened process model for data mining. The
process model helps you answer the questions listed earlier in this section, a nd makes sure the
important points are addressed. It serves as a data mining road map so th aty ou will not lose your
way as you dig into the complexities of your data.
The data mining process suggested for use with SPSS Modeler is the Cross-Industry Standard
Process for Data Mining (CRISP-DM). As you can tell from the name, this model is designed as a
general model that can be applied to a wide variety of industries and business problems.
The CRISP-DM Process Model
The general CRISP-DM process model includes six phases that address the main issues in data
mining. The six phases t together in a cyclical proc ess designed to incorporate data mining
into your larger business practices.