I have frequently experienced projects that ran into serious problems due to data issues – typically issues caused by differences between various source systems. The requirement is to be able to identify the same object in different systems. The problems could show up in various types of projects like Analytics/BI projects, Big Data projects, projects integrating new frontend systems, projects consolidating or replacing backend systems, etc.
In short all IT related projects.
Current status of MDM
To my experience, only a very few organizations have successfully solved all MDM issues.
Organizations have approached master data management in numerous ways. Some organizations are drawing upon their existing resources to handle master data management, often calling upon employees to manually clean and migrate data. This method tends to be prone to human error, causing further complications and does not scale well as business needs change. Many organizations have implemented specific data management tools to aid with integration and cleansing. Integration tools, however, do not always support large amounts of data and are limited in the types of files and data sources they can manipulate.
Another strategy implemented by organizations, despite common understanding that it is a poor solution, is point-to-point integration. Point-to-point integration, commonly referred to as custom code, is a method in which skilled developers write custom code and implement it within each specific endpoint in order to create connectivity. This requires extensive knowledge of each endpoint, and as the number of endpoints increases, it becomes a grueling task. Moreover, as organizations take advantage of mobile, cloud, and SaaS applications to power their business, their IT ecosystem grows in complexity. With more and more endpoints requiring connectivity, point-to-point integration becomes a complex and fragile “spaghetti architecture”.
Why is MDM important?
As organizations try to change from being reactive to become proactive the data requirements change from traditional analysis describing events and behavior towards analysis predicting customer behavior and prescribing appropriate business responses.
This change requires data to be identifiable across every system available both internally and externally.
Some organizations use a shift towards “customer centricity” to start working on Customer Data Integration (CDI), which is a subset of Master Data Management for a single data domain, with the purpose of achieving a 360° customer view to increase top-line growth.
Other organizations try to optimize the supply chain by focusing on product master data management and in this way reduce costs and improve profitability.
For most companies, all of these activities need to be completed – fast – to be able to compete with new entrants with no physical assets and other legacies to deprecate.
Gartner Group’s model
Gartner Group suggests four different styles of MDM Hub implementations:
- Consolidation Style
MDM Hub is the system of reference for reporting purposes and store all master data with no need to call backend systems. - Registry Style
MDM Hub is the system of reference, but store only an index to call relevant backend system to get complete master data. - Coexistence Style
MDM Hub is the system of reference and stores all master data with no need to call backend systems –one system of entry updates the MDM Hub and the MDM Hub then updates all other backend systems. - Centralized Style
MDM Hub is both a system of reference and system of entry – it updates necessary data in all backend systems.
Most people I have talked to are convinced that the centralized style is the ideal implementation but also the most challenging to implement. That is not an excuse to do nothing!
Suggested approach
If you have a single high-performing system of entry for master data in each data domain and this system is a strategic choice for your company – then I would choose a Registry style MDM Hub. This is easy to implement, and you do not need advanced data quality tools to help you clean and consolidate data across multiple systems.
Unfortunately, the prerequisite stated in the paragraph above does not fit the majority of companies. So instead, I would choose the Centralized style MDM Hub – but I would try to avoid re-implementing the entire system landscape in one big project.
A simple but proven approach is to do exactly this is to start building an MDM Hub in Coexisting style and gradually including more and more backend systems and implementing automated synchronization of master data across more and more of this system gradually. For a start mapping of master data could be done entirely manual, you could build a small mapping tool handling the obvious cases and leaving complex ones for manual handling. If needed you can supplement with a specialized tool for data cleansing based on a business case with a positive NPV.
Gradually you can improve your MDM solution to include UI driven MDM tasks.
If you build your MDM solution on a Staged Event-Driven Architecture (SEDA), you can build your own MDM solution that is robust, responsive and fault tolerant starting small in Coexisting style and gradually growing and improving to end up with a Centralized style MDM Hub. Do you need SEDA? – No, but using traditional SOA or EDA you will lose some robustness and fault tolerance. My advice would be to use SEDA as I see MDM and the entire Middle Layer architecture to be vital components in any business without exceptions.
Using this approach, you handle issues one by one in a prioritized sequence with no big bang projects – actually, an approach that is well suited for an agile or DevOps approach to build solutions with great flexibility.
The MDM Hub storage would be a natural part of your Business Data Lake or Data Warehouse – in fact, if you have an existing data warehouse chances are that the needed storage structures are already in place. Question is, if your data warehouse can handle the transactional, near real- time load that an MDM solution will generate. A modern real-time data warehouse will be able to handle this as will a Business Data Lake, but more mature data warehouses (read old-fashioned) designed for batch-updates might not provide the performance necessary to handle the load.
If you have none of these structures in place, I would suggest you take this opportunity to start building your Business Data Lake.