The Role of Data Management (and MDM) in Analytics
Analytics + MDM: the ultimate Data Science equation
The interplay between Analytics and MDM is hard to miss, and yet, not recognized as widely as it should be. In this article composed of a two-part essay, join me as we uncover the dynamics between this power couple, arriving at the definitive equation:
Data Science = Data Management + Analytics!
Part II: The Role of Data Management (and MDM) in Analytics
Click here for Part I: The Role of Analytics in Master Data Management
Previously, we dissected the important role that analytics plays in Master Data Management (MDM). But that begs another question – flipping the script, does data management and MDM play a role in analytics?
To answer that question, let’s look at a popular methodology that data scientists employ, called CRISP-DM (Cross Industry Standard Process for Data Mining).
The CRISP-DM Model
A diagrammatic representation of the CRISP-DM model that’s immensely popular with analytics experts demonstrates the six steps this model recommends to mine and analyze data. Step 2 (Data Understanding or Data Study) and Step 3 (Data Preparation) involve Data Management activities. A key goal of data management is to integrate and co-locate relevant data in a warehouse repository and improve its quality (see Part I for more details). Steps 2 and 3 of the CRISP-DM model address these exact goals. Step 3 also involves another key activity – Data Modeling (not to be confused with Step 4 (Modeling) in the diagram which refers to the creation of statistical or analytical models). To properly achieve the goal of improved data quality, the warehoused data needs to be arranged in data models which identify the entities, attributes, relationships and hierarchies of the data, especially as it relates to master data.
So now, let’s ask again. Does data management and MDM have a part to play in analytics projects? If almost half of the most popular data science model involves data management activities like we just discussed, then shouldn’t the answer be an emphatic YES?
Now that we have established that data management is deeply integrated into analytics, let’s ask the real million-dollar question – why is the role of data management in analytics so widely undermined and even ignored?
A question of quality of Analytics
Poor data quality leads to poor quality of analytics. That’s obvious and every data scientist knows this. But in the absence of a traditional data management solution and lack of specialized treatment of data based on its type, data scientists are ill-equipped to improve data quality. Therefore, they employ rudimentary techniques using tools like Excel and scripting in languages like Python to integrate and cleanse data. However, this approach is unproductive, often ineffective and as the data sources and volumes grow, it’s almost always unscalable. What a data scientist needs to produce quality analytics solutions is a sound data management strategy and mature enterprise-grade data management tools.
Talking about data quality, an important point needs to be called out here before we move on. A foundational step in data management is data modeling. This is different from statistical and analytical modeling that data scientists deploy for analytics. Without going into depth, suffice to say that data modeling improves data quality, thus greatly improving the ensuing analytical models that data scientists will construct.
So, what’s the problem?
What are the challenges in the way of introducing data management into analytics? From all my experience in participating in data management projects and working with data scientists, it comes down to these factors:
- Awareness (or lack thereof) – just like the data management experts have failed to realize the potential of analytics as I explained in the beginning, analytics experts too have been generally unaware that there exist data management solutions in the market which specialize in data quality and governance.
- Collaboration (or lack thereof) – even in companies where there are dedicated data management and analytics initiatives, there is little to no collaboration and cooperation between the two. This ties back to lack of awareness, but, more importantly, to failure on the management’s part to recognize the synergy between the two.
- Scale (or lack thereof) – many analytics projects are limited in scope and involve smaller and siloed datasets which just don’t exhibit an unwieldy data quality problem, and hence, do not warrant an enterprise-grade data management system. But even so, as the company grows and ages, the data volumes expand and the need to integrate data from different sources and domains increases; thus justifying an investment in a commercial data management solution.
Finally – the real definition of data science
The general perception about data science is that it is all about analytics and often involves ML and AI. However, as seen from the CRISP-DM model, this is a misconception. Data science also often involves data management (in one form or the other) which is a precursor to the ensuing analytics. Even if a data science initiative does not always employ an enterprise-grade data management solution, key data management activities to improve data quality need to be performed.
Therefore, the real definition of data science is:
Data Science = Data Management + Analytics!
Going beyond equations, the data science experts at IntelliTide have developed FactorAI, a SaaS solution that augments MDM with advanced analytics to help businesses extract the greatest value from their master data. To find out more, go to: factorai.com
Click here for Part I: The Role of Analytics in Master Data Management