Cross Industry Standard Process for Data Mining (CRISP–DM) framework

Madhudeepa Jois
D to E Data Science
2 min readJun 5, 2021

--

This is a robust process to solve any analytics problem applicable to any domain.

It involves series of steps:

  1. Business understanding
  2. Data understanding
  3. Data Preparation
  4. Data Modelling
  5. Model Evaluation
  6. Model Deployment

Let’s look into each step in detail

Business understanding:

  • Business Objective : For example, a firm wants to understand which segment has good return on investment. It is good to understand their exact requirement- is it 5%, 10% or 30%?
  • Goals of Data Analysis: Break down problem and think about possible solutions — The above problem can be solved by market segment analysis or clustering?

It is really important to understand business objective and to understand what is important for the business.

Understanding Raw data:

Quality data can produce wonderful outputs. Understanding the type of data, what columns and rows they have, central tendencies, how they are related…

  • The type of data sets that are available
  • The information we can get from it
  • Exploring the data
  • Performing quality checks on the data sets — address factors like, is the data complete? Is the data correct? Is there any error if yes, how common are they? Is there any missing value?

Preparing data:

Prepare and clean the data to be ready for analysis. If data is garbage then the model output will be senseless. Only select relevant data, integrate data files(master file), clean all discrepancies, change format, create new features if required.

Modelling:

This is called as the heart of data analysis. “If you torture the data long enough, it will confess.”

Models should be succinct, mathematically sound, Efficient and easy. In this framework there are two major steps to build a model.

  • To find right family of model
  • To find right algorithm from the chosen family

Model Evaluation and Deployment:

Not all models are correct and useful. Modelling and evaluation are iterative process. Models must be robust and effective in nature.

Thank you for reading!!!

--

--