In the world of data, new words are thrown into the fray every so often and it seems that a revolution is coming. It happened a long time ago with Big Data (which started as 3V and is now nV), Near Real Time, Real Real Time and for a few years, Lakehouse, Data Fabric and Data Mesh have resonated strongly. But what is behind?
As we can see in Google Trend, both Data Mesh and Data Fabric for now maintain a very similar trend in terms of the number of searches, nothing to do with the oscillating Lakehouse.
Does that mean that something is cooking slowly or, on the contrary, that they are approximations that do not finish curdling? Well, diving into the term Data Mesh, I found a certain bibliography that I include at the end of the article, where the concept is explained quite well.
On this occasion, the purpose of the article is that I want to highlight the key aspects of Data Mesh and where I appreciate its benefits compared to what can be a monolithic Modern Data Platform like Microsoft’s in Azure.
The main differences lie in the vision of data as a product and in the decentralization of management. The first does not seem bad to me, each business area or department is the one that, theoretically, has the most knowledge of its own business, so involving them in the processes is interesting to me. With regard to the second point, decentralization in management, even though it is through a federated government, already makes me more afraid, because in a certain way I see it as a step backwards towards isolation and the reappearance of silos. It has cost a lot to break down the barriers between departments for the effective distribution of information throughout organizations. I would not like to retrace my steps. Finally, mention self-service as an additional sticking point to consider.
With all this, where do I see it as very interesting to make this approach? Undoubtedly, in multinational companies, because with the change in the treatment of data towards product, it allows aspects such as their anonymization, as well as the construction of their corresponding measures and indicators very close to the place where the data is produced. That is, the preparation of the information is delegated to the domain. In this way, risks due to non-compliance with policies such as GDPR or HIPAA are avoided, since the data always resides in the territory and the information that is sent abroad does not contain PII data. Without a doubt, a whole up with respect to security and regulatory compliance.
On the other hand, this approach reduces computational requirements, since it divides the workload by each region, country, department. What benefits the processing of it, both locally and globally. Likewise, performance is benefited, since by using regions closer to the location of the data, availability is higher and latency is lower. In other words, everything results in greater efficiency.
However, by delegating the work to each region, we might think that maintainability, knowledge and speed of development would be penalized. It is not true, since, for example, developing collaboratively, delimiting by domain, as if it were any other project and adding the ability to configure deployment pipelines to multiple regions, the result would be just the opposite. We would have the same image replicated n times. In other words, if each branch has a CRM and must carry out the same transformation process, the domain team would be in charge of developing the work code and it would be deployed in each region with a simple click. In short, you work once and propagate to n.
Continuing with the teams, it’s time to talk about the platform team. Something very similar would happen here. To be tremendously efficient, it would be ideal to build the entire infrastructure as code, that is, components as templates. In this way, with a simple deployment through Azure DevOps Pipelines, you would homogenize all the regions. Another positive aspect is that, by designating the domain and its team as the owner of the data, the option of working under the SAFe methodology can be tremendously interesting. Well, SAFe is nothing more than an industrial scaling framework for agile methodology common in large companies. Through its use, risks linked to data between domains could be made visible in a simple way and even the dependencies between different transformation projects that require the creation or update of some of the domain services. This also results in a better estimate, since the owner of the domain, as we have mentioned before, has a high degree of knowledge about the service.
In a way it is very similar to how microservices-based application architectures work. These have their strengths and weaknesses, but it is clear that for certain business cases, they are a very good option. From my point of view, I believe that Data Mesh is one more option within data architectures, but one that makes special sense in the field of multinational organization.
More information in:
- Zhamak Dehghani, May 20, 2019, https://martinfowler.com/articles/data-monolith-to-mesh.html
- Zhamak Dehghani, 3 December 2020, https://martinfowler.com/articles/data-mesh-principles.html
- James Serra, February 16, 2021, https://www.jamesserra.com/archive/2021/02/data-mesh/
- Gerardo Vázquez, 15 September 2021, https://www.bbvanexttechnologies.com/pills/data-mesh-una-nueva-aproximacion-para-una-arquitectura-de-datos-transformacional/