Could data mesh drive data democratisation?

By Mathias Golombek, CTO of Exasol.

  • 3 years ago Posted in

The volume of data generated globally is growing exponentially, with three times the amount of data forecast to be created over the next five years compared to the previous five, according to an IDC report.

This data forms the essential foundation from which organisations are drawing actionable insights. But the collection and management of data is a pressing challenge that organisations are trying to solve by modernising their data architectures.

Data mesh has emerged as a new approach that addresses some of the key challenges associated with data silos and the ability of organisations to scale their traditional data platforms amid the explosion of data.

But what exactly is data mesh and why are more and more companies looking to implement it? What are its strengths and weaknesses, and why are organisation such as Netflix and Zalando adopting data mesh to create a self-service data infrastructure?

The origins of data mesh The concept of data mesh was first introduced by Zhamak Dehghani of ThoughtWorks as a response to seeing large customers investing a lot of money in big data platforms but failing to see value from their investments.

Dehghani argues that data platforms based on traditional data warehouse or data lake models are centralised and monolithic and, as such, create bottlenecks as organisations look to scale. Instead of centralised data lakes or warehouses, data mesh presents a shift to a more de-centralised and distributed architecture that supports a self-serve data infrastructure and treats data more as a self-contained product.

In a traditional data architecture, there is normally a disconnect between where data gets created and where it gets consumed. Typically, producers of data generate it and send it into the data lake with data consumers (located in a different silo) not necessarily having the required knowledge to understand it.

The concept of data mesh decomposes data around the domains (e.g. different parts of the business) and pushes data ownership responsibility to the teams with the relevant domain expert knowledge to create, catalogue and store the data. A business domain (e.g. finance) provides data as a product — discoverable, reliable and ready to be used for analytic purposes. The data product owner is the

business domain representative that ensures no specific domain knowledge gets lost and no bottlenecks occur at the central data team.

Towards data democratisation

Data mesh has multiple benefits and some challenges that need to be considered when approaching its implementation. Overall, the decentralised data creation brings more visibility and makes data easier to digest and consume. It also helps to truly democratise the data because data consumers don’t have to worry about the data discovery and can focus on experimentation, innovation and generation of new value from data.

Because of the decentralised data operations and the provisioned data infrastructure as a service, data mesh results in greater agility and scalability, with teams focusing on relevant data products. It also supports the creation of a federated, global governance that enables interoperability and simplifies access to data.

Despite data mesh architecture gaining a lot of traction recently, there are concerns in the industry about its application. If organisations decide to go down the data mesh route, then getting the tech stack right will be crucial to their efforts. A powerful, high-performance, tuning-free analytics database that can scale with the diverse access from various data consumers will be key.

Data mesh best practice

According to Deloitte, the concept of data mesh is particularly suitable for companies that have a diverse data landscape with various different business domains, manage a high number of data sources, or pursue rapidly changing business goals. Some of the early adopters of data mesh include streaming giant Netflix and Zalando, Europe’s biggest online fashion retailer.

Netflix processes trillions of events and petabytes of data a day. Following the growth of original productions with Netflix Studio, data integration across the streaming service and the studio, along with the scalability to support this growth, became a priority. Netflix turned to data mesh to integrate data across hundreds of different data stores in a way that enables it to holistically optimise cost and performance while reducing operational complexity.

Zalando on the other hand moved from a centralised data lake to a distributed data mesh architecture to create true data products that are discoverable, secure, trustworthy and interoperable, guaranteeing quality and data ownership. The retailer realised that data accessibility and availability on a large scale can only be guaranteed if primary responsibility resides with those who generate the data and have the relevant domain expertise. At Zalando, only data governance and metadata information is managed centrally to enable interoperability.

Data mesh isn’t a silver bullet to all the challenges of traditional data platforms and data silos. However, best practice cases such as those from Netflix and Zalando prove it offers a viable route to extracting more value from data to fuel business growth.

By Raja Rao, Head of Growth Marketing, Redis.
By Joe Beaumont, Head of Hospitality at Exponential-e.
How you can harness the power of graph analytics to achieve a 360 customer view without rebuilding...
By Simon Spring, Account Director EMEA, WhereScape.
By James Fisher, Chief Product Officer, Qlik.
By Dale Murray, CEO at SalesAgility.