How to analyze data (in 6-steps)
inspired by the CRISP-DM methodology; also downloadable project roadmap
We hosted a webinar earlier this week that was focused on Storytelling with Data. The guest speaker Christina Boris walked the online attendees though a live case-study and demo of a multi-level approach that she uses to summarize important facts and figures from lengthy reports. She also compared manually-generated reports with AI-generated ones!
Get a preview of the concept here. If you missed the webinar, watch the recording here or on our YouTube channel.
Since we are on the topic of data and how to summarize the results of research, I wanted to tackle a related topic - Data Analysis.
If you’ve worked as a data analyst at any point in your career, you know how it works and the various steps involved in the process at a high level even if your assigned or chosen tasks cover only one or a few parts of this process.
Most analysts will likely tell you that it begins with data access, continues with data cleaning, mapping and exploration, followed by analysis and ends with delivering results.
This is in fact correct. But there can be more to it.
The process actually begins with gaining an in-depth understanding of the business goal. Every step in the analysis process then on must keep that goal at the forefront.
Particularly if you are early in your data career and want to excel in your role, you should use a structured approach that includes more steps than just those related to data.
Why a structured data analysis process matters
There are many reasons why following a structured data analysis process is important for the success of your project. Here are some of them.
Makes is easier for project managers to objectively manage the project. A well managed project saves time and costs.
If you have a documented list of steps that you complete as you move forward in the process, you can trace your steps back if you need to identify when a certain issue occurred. This can help you avoid costly mistakes.
Processes that are repeated invariably get tested each time they are used. This makes them reliable and eventually you get more efficient at it. This increases your productivity.
Data analysis is a science but it is also an art. When you have a well defined process of data analysis comprised of a series of established tasks, it frees your mind to think creatively about how to analyze data than to spend time wondering if you have checked all the boxes.
But, here’s the interesting part. In spite of data analysis being a part and parcel of billions of projects around the globe,…
……….Only 18% of data science teams follow an explicit analysis process
- Schroer, Kruse and Gomez (2021)
That means there is plenty of opportunity to stand out by following an explicit, structured data analysis process.
In case you haven’t come across it before, let me introduce you to the Cross-Industry Standard Process for Data Mining or CRISP-DM.
Please use this process only as a reference and modify it as required to meet your own needs. I am a firm believer that existing frameworks are great starting points. Ultimately, the ones that work the best are those which you customize to meet the needs of your role and organization.
What is the CRISP-DM process?
Published in 2000, it is a popularly used framework that guides the data analysis process in any project.
Another pro of this structured process is that it is industry-independent. You can use it in your role, regardless of whether you work in media or retail or any other industry.
The 6-Steps
✅Step 1 - Business Issue Understanding
You will hear it again and again (and again) from any expert trying to solve a business problem that the first step is to get a solid understanding of the business issue. To achieve this, you must speak with your stakeholders, be they business partners, clients or customers. In conference with these individuals, describe the following points -
Define the issue, business goals and objectives.
These should be defined broadly or qualitatively (tell the story around these goals) and specifically or quantitatively. Since all analytics projects are quantitative by definition, setting objective, quantitative goals is ideal.
Let’s use an example that we can use for the purpose of illustrating this approach.
Let’s assume that your business goal is to get better at managing inventory in your company warehouses. For that, your related data project goal is to forecast sales for the next year by quarter.
Know your why.
If you are the data analyst or team, you should discuss the ‘why’ in great detail with the business or client team.
In the forecasting example above, we already agreed that the business is looking for sales forecasts so that it can place orders for the right amount of inventory with the suppliers.
Alternatively, maybe you need to report out those forecasts to an insurance company, external financers or another reason? You need these answers to clearly understand the business issue and context. You also need it to define deliverables and success for your project.
Assess the current state and describe what you need.
Another part of understanding the business issue is to know the current state. In our example, how does the business decide what and how many products to order? What data are available? What resources are available?
Start building the project roadmap.
Once you have answers to some of these questions, you can start building out the project roadmap. Also include costs, dependencies, risks, timelines, details on the scope of work and deliverables.
Try to keep a ‘cushion’ i.e. some flexibility when calculating all of the afore and setting timelines just in case there are unforeseen contingencies that come up.
There are a variety of roadmap templates you can use. In my experience, a detailed Gannt chart can often do the trick.
If your project uses the services of a project manager, they might be responsible for building the roadmap. Even so, make sure that you are aware of its details.
Here’s an Excel-based one from Zapier that you can download for free and automate.

✅Step 2- Data Understanding
Begin this phase by collecting data from the different sources. For data from each source, you need to explore it, check its quality and adequacy and describe it.
You can run descriptive analytics and visualizations on the data at this stage for a more scientific approach to understanding your data.
As a general rule, document your data exploration and any quality issues so that you have reference materials available as you move further in the process.
Here is a helpful image by Zipporah Luna on the Data Understanding phase.
✅Step 3- Data Preparation
After exploring your data, you need to select and prepare all or a part of it for analysis. For example, let’s say you have a rich dataset with 20 years of historical product sales. Great! More data can sometimes be merrier.
However, due to the rise of online sales, your product sales patterns have changed in the last 5 years. In this case, you might not need to use all 20 years of data. You may need just 5!
Figure out what and how much data you need in this phase and document your reasons for this selection. Documentation also helps when it comes to explaining your results to your final audience.
This step may also include data cleaning, combining and integrating datasets, transforming and creating new variables if required and formatting and storing it for easy accessibility.
This part of the process can be fairly time consuming depending on the size and complexity of your data. There are also many different sub-steps to consider such as those listed in the previous paragraph. We have kept this discussion high-level but if you are interested in learning more, contact us conscisolutions@gmail.com.
✅Step 4- Modeling
Depending on the type of data analysis project, this step involves applying the appropriate statistical model to pick up key signals in your data and generating results. Experienced data scientists sometimes try out a couple of different techniques to find the right model for the task.
In some cases, your project might not require advanced analytics and may instead need a basic excel-based model that does not require code and statistical methods.
Regardless of the level of analytical technique required, it is good practice to iterate and assess your model until you believe you have the best one for your purpose.
✅Step 5- Evaluation
Steps 2-4 are more data-oriented but Evaluation brings us back to the business goals which we identified in Step 1.
Here, you review the results and determine whether they provide the answers your business is seeking. Based on your review, you can decide whether to iterate or accept the results and move on to the next step.
It is also a good time to review your previous steps and ensure that nothing was missed or overlooked during the analysis. As in the earlier steps, document the outcome of your review and the reasons for your decision.
✅Step 6- Deployment
This is the final step of the process. While the original CRISP-DM framework does include this step, you should feel free to change it.
The bigger your organization, the less likely that the data analysts are also responsible for deploying the solutions. Often, there are teams dedicated to deployment in larger companies.
For most of us, the final step in this process is presenting results to stakeholders through visuals, decks and applying storytelling methods.
Depending on what happens next, it is useful to monitor how those results are being applied in your business for two reasons.
The first - so that you, as the author of that work, can correct any misinterpretation.
And second, you can learn from how your stakeholder used your work and bring that learning to your next data analysis task.
Once you have completed the set of tasks related to this project, good practice calls for a project retrospective about what went well, what could be improved and what tips to keep in mind for the future.
When it is okay not to complete each step
In reality, data analysis is a continuous process. It does not actually end after the presentation of results because there are always follow on business issues that need you to repeat it.
While you should have a structured process for the reasons mentioned earlier on in this essay, and definitely if you are new to the team, there are two scenarios when it is alright not to be as strict about it.
When time is of the essence. In that case, prioritize what is most important to delivering the results quickly.
For repeat projects such as delivering forecasts every quarter. Here, the business issue has not changed.
References -
Academic article on the approach - A Systematic Literature Review on Applying CRISP-DM Process Model
If you found the information in this newsletter, please share it with your colleagues.