Now let’s think about data
A data marketplace, which lets you browse multiple data assets you could use for any reason, is managed much like an Amazon product. In our terms, this is how we define the data product.
Data products include:
- Distributable datasets
- Reports and the pipelines that create them
- Machine learning models
Each of these data products is developed, managed, governed and distributed as an end-to-end solution. There could be multiple or single assets, which in turn make up the product. But the valuable, usable outcome of the process is the product.
Historically, enterprises built holistic solutions by combining different domains into one resource. Multiple disparate, unrelated items of value could be produced from one facility.
This sounds great, but that is like Amazon telling everyone wishing to sell something on the market, they have to use one holistic store, which is managed by one store owner, including all of the inventory. In this scenario, everyone’s products are dependent on great management of that single store. Whilst this may mean less admin for the individual sellers, there is high risk and far more dependency should that store owner not be great at maintaining his store. If that happened, then the seller would be more likely to go and create their own store elsewhere.
Back to data – in the marketplace, a prevailing preconception exists that enterprise data solutions are unsuccessful. This view is driven by the experience of failed data warehouse / data lake projects that have stretched over several years. The crux of the matter lies in the ever-evolving nature of each business unit’s requirements, which metamorphose at disparate rates. Whilst one faction of the organisation may clamour for swift implementation, another may exhibit a nonchalant stance. These perils, unfortunately, rear their heads when contemplating a centralised data repository.
A way to work around this challenge is by implementing the ‘data as a product’ architectural design pattern, which gives ownership back to the domain itself. The merits of such an architectural paradigm are substantial. It enables swifter domain deployments, a curbed risk profile, vigilant data quality management at the domain level, streamlined governance, and an unequivocal delineation of data ownership; all defined in the data mesh design pattern.