ETL vs MDM
by Matthew Dahlman | November 19, 2015
Master Data Management in relation to Data Integration
Master Data Management (MDM) and Data Integration or Extract, Transform, Load tools (DI or ETL) are all part of a complete Enterprise Information Management (EIM) architecture. But there's some level of confusion or disagreement about how exactly they relate to one another. I have heard the sentiment, "We don't need an MDM, we can do the matching we need in our ETL jobs." This claim misses the point in some important respects. Let's consider how these different technologies compare with each other and complement each other.
They solve data problems
ETL tools solve data problems. MDM solutions solve data problems. In this sense they are similar technologies. IT uses ETL to move data from one place or one format to another. MDM solutions solve business problems resulting from inaccurate or incomplete data. The business uses MDM to gain a single view of customers or products. Yes, they are both tools and technologies that deal with data.
But they don't address identical problems, and they don't approach problems from the same direction.
IT-centric or business-centric
ETL tools are focused on moving data. Moving data is something the IT team thinks about. A data warehouse provides a useful prototypical example. Strategic business problems may be defined at the highest levels of an organization. For certain problems, the solution is an enterprise data warehouse (EDW). Having defined this goal, the architecture team determines the technologies needed to realize it. Maybe there's a relational database (perhaps a petabyte-scale, columnar, shared nothing, MPP, horizontally scaling, Hadoop-based database). Maybe there's a business intelligence tool (perhaps with a cloud-based, robust, dynamic, interactive thought-provoking visualization engine). Surely there is a data integration tool (perhaps it's ETL, or ELT, or SOA-based).
All of these architecture decisions are managed by the IT group. The business team doesn't directly care about the database, the BI tool, or the ETL tool. They care about the overall data warehouse giving them access to the data they need to answer some important questions.
MDM solutions, on the other hand, should be focused on a business problem. Gathering a more complete view of one's customers is a typical example. We know our support center could help users better if it had a more accurate and more complete picture of the customer. For this problem, master data management may provide the solution. Maybe there's a database, some dashboards, and some data integration involved.
The business team doesn't directly care about the database or the dashboard technology or the ETL tool. But they definitely care about the accurate definition of a customer and how to roll up or drill down into hierarchies.
Put another way, an ETL tool appeals to a developer. An MDM tool appeals to a data steward.
Different levels in the stack
An MDM solution nearly always relies on some ETL technologies to load data into or extract data out of it.
An ETL tool could live a happy life without ever touching an MDM project. It could be used for a data migration, it could synchronize data between systems, it could perform lookups and transformations.
ETL tools have features like these:
- Wait for a file
- Run a SQL query
- Find a value in a lookup table
MDM tools have features like these:
- Track the lineage of a golden record
- Edit records to override source systems
- Define survivorship rules
Build vs Buy
When the business problem at hand revolves around data consolidation, matching, and cleansing, the ETL team may say, "Great! Our ETL tool can bring the data together. It can transform and cleanse the data. This is the solution to the problem."
The MDM team will say, "Great! Our MDM technology can bring the data together. MDM can cleanse and enrich the data. This is the solution to the problem."
In many cases this leads into a classic build vs buy discussion. Perhaps in some cases your ETL technology can be used to build an MDM solution. If your matching rules and data cleansing requirements really are fundamentally different from everyone else's requirements for some reason, then maybe you should be building your solution. If your definition of a customer is relatively similar to the definition that others in your industry use, then you should be thinking about buying. Most enterprises lean towards buying their MDM solutions rather than building them.
There are some common requirements that lots of people need to satisfy with their master data. A good MDM tool has things like these pre-built and pre-tested:
- Tracing data lineage - which sources contributed to this record?
- Data versioning - what did this product look like last year?
- Business hierarchies - what parent entity does this organization roll up to?
- Automatically generated user interface for non-technical users to maintain data
- Workflows for reviewing and confirming system-identified matches
Given enough time you could build these features from your ETL tool, but you can buy them from an MDM vendor.
In the end, it's a false dichotomy in the sense that every enterprise will have both ETL and MDM technologies. They certainly don't simply choose one or the other.
MDM solutions define rules for data cleansing, record matching and merging, data enrichments, etc. When an anomaly occurs, the MDM will route the offending records to a data steward for manual intervention and approval. The data steward rather than IT has the domain expertise about the business data. The level of the confidence threshold determines when automated cleansing defers to manual intervention. The data stewards are subject matter experts that reside within the business. They will directly interact with the MDM through its user interface linking data quality responsibility to the business. The result is a much more agile data management approach.
But how does the data get into the MDM? How does it complete its round-trip to get back to the source systems? ETL tools will provide this part of the solution.
So when your information management environment is ready for managing your master data, consider which MDM tool to add to your existing set of operating systems, databases, and ETL tools.