DATA MESH, TOWARDS A NEW PARADIGM IN DATA?

March 2, 2022

Sogeti Labs

In the world of data, new words are thrown into the fray every so often and it seems that a revolution is coming. It happened a long time ago with Big Data (which started as 3V and is now nV), Near Real Time, Real Real Time and for a few years, Lakehouse, Data Fabric and Data Mesh have resonated strongly. But what is behind?

As we can see in Google Trend, both Data Mesh and Data Fabric for now maintain a very similar trend in terms of the number of searches, nothing to do with the oscillating Lakehouse.

Does that mean that something is cooking slowly or, on the contrary, that they are approximations that do not finish curdling? Well, diving into the term Data Mesh, I found a certain bibliography that I include at the end of the article, where the concept is explained quite well.

On this occasion, the purpose of the article is that I want to highlight the key aspects of Data Mesh and where I appreciate its benefits compared to what can be a monolithic Modern Data Platform like Microsoft’s in Azure.

The main differences lie in the vision of data as a product and in the decentralization of management. The first does not seem bad to me, each business area or department is the one that, theoretically, has the most knowledge of its own business, so involving them in the processes is interesting to me. With regard to the second point, decentralization in management, even though it is through a federated government, already makes me more afraid, because in a certain way I see it as a step backwards towards isolation and the reappearance of silos. It has cost a lot to break down the barriers between departments for the effective distribution of information throughout organizations. I would not like to retrace my steps. Finally, mention self-service as an additional sticking point to consider.

With all this, where do I see it as very interesting to make this approach? Undoubtedly, in multinational companies, because with the change in the treatment of data towards product, it allows aspects such as their anonymization, as well as the construction of their corresponding measures and indicators very close to the place where the data is produced. That is, the preparation of the information is delegated to the domain. In this way, risks due to non-compliance with policies such as GDPR or HIPAA are avoided, since the data always resides in the territory and the information that is sent abroad does not contain PII data. Without a doubt, a whole up with respect to security and regulatory compliance.

On the other hand, this approach reduces computational requirements, since it divides the workload by each region, country, department. What benefits the processing of it, both locally and globally. Likewise, performance is benefited, since by using regions closer to the location of the data, availability is higher and latency is lower. In other words, everything results in greater efficiency.

However, by delegating the work to each region, we might think that maintainability, knowledge and speed of development would be penalized. It is not true, since, for example, developing collaboratively, delimiting by domain, as if it were any other project and adding the ability to configure deployment pipelines to multiple regions, the result would be just the opposite. We would have the same image replicated n times. In other words, if each branch has a CRM and must carry out the same transformation process, the domain team would be in charge of developing the work code and it would be deployed in each region with a simple click. In short, you work once and propagate to n.

Continuing with the teams, it’s time to talk about the platform team. Something very similar would happen here. To be tremendously efficient, it would be ideal to build the entire infrastructure as code, that is, components as templates. In this way, with a simple deployment through Azure DevOps Pipelines, you would homogenize all the regions. Another positive aspect is that, by designating the domain and its team as the owner of the data, the option of working under the SAFe methodology can be tremendously interesting. Well, SAFe is nothing more than an industrial scaling framework for agile methodology common in large companies. Through its use, risks linked to data between domains could be made visible in a simple way and even the dependencies between different transformation projects that require the creation or update of some of the domain services. This also results in a better estimate, since the owner of the domain, as we have mentioned before, has a high degree of knowledge about the service.

CONCLUSION

In a way it is very similar to how microservices-based application architectures work. These have their strengths and weaknesses, but it is clear that for certain business cases, they are a very good option. From my point of view, I believe that Data Mesh is one more option within data architectures, but one that makes special sense in the field of multinational organization.

More information in:

Zhamak Dehghani, May 20, 2019, https://martinfowler.com/articles/data-monolith-to-mesh.html
Zhamak Dehghani, 3 December 2020, https://martinfowler.com/articles/data-mesh-principles.html
James Serra, February 16, 2021, https://www.jamesserra.com/archive/2021/02/data-mesh/
Gerardo Vázquez, 15 September 2021, https://www.bbvanexttechnologies.com/pills/data-mesh-una-nueva-aproximacion-para-una-arquitectura-de-datos-transformacional/

About the author

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Generative AI

Cloud

Testing

Artificial intelligence

Security

DATA MESH, TOWARDS A NEW PARADIGM IN DATA?

March 2, 2022

CONCLUSION

About the author

Related posts

As Data Scientists, what should we prepare for in the future AI-driven world?

Embracing Gen AI: The essential skills for developing Gen AI software

Data-driven strategies in Agriculture, Forestry and Environmental policies

The Data Age

Technology Labs podcast: Reality / Virtual Humans

Azure DevOps, Visual Studio, GitFlow, and other techniques from the heap

GTD is not only VW Golf Turbo Diesel. Brushstrokes of David Allen

So You Want to Leverage Data Science For Your Business

Don’t test with production data

Why CRM is an utter failure and how to fix it

Comments

Leave a Reply Cancel reply