33
Understanding Data Mining
Figure 4-1
CRISP-DM process model
The six phases include:
Businessun derstanding. This is perhaps the most impor tant phase of data mining. Business
understanding includes determining business objectives, assessing the situation, determ ining
data mining goals, and producing a project plan.
Dataundersta nding. Data provides the “raw materials” of data mining. This phase a ddresses
the need to understand what your data resources are and the charac teristics of those resources.
It includes collecting initial data, describing data, exploring data, and verify ing data quality.
The Data Audit node available from the Output nodes palette is an indispensable tool for
data understanding.
Datapreparation. After cataloging your data resources, you will need to prepare yo ur data for
mining. Preparations include selecting, cleaning, constr ucting, integrating, and formatting
data.
Modeling. This is, of course, the ashy part of data mining, where sophisticated analysis
methods are used to extract information from the data. Thi s phase involves selecting modeling
techniques, generating test designs, and building and assessing models.
Evaluation. Once you have chosen your models, you are ready to evaluate how the data mining
results can help you to achieve your business objectives. Elements of this phase include
evaluating results, reviewing the data mining process, and determining the next steps.
Deployment. Now that you have invested all of this effort, it is time to reap the benets. This
phase focuses on integrating your new knowledge into your everyday business processes to
solve your original business problem. Thi s phase includes plan deployment, monitoring and
maintenance, producing a nal report, and reviewing the project.
There are some key points in this process model. First, while the re is age neral tendency for the
process to ow through the steps in the order outlined in the previous paragraphs, there are also a
number of places where the phases inuence each other in a nonlinear way. Fo re xample, data
preparation usually precedes modeling. However, decisions made and information gathered
during the modeling phase can often lead you to rethink parts of the data preparation phase, which
can then present new modeling issues. The two phases fe ed back on each other until both phases